高等学校化学学报 ›› 2024, Vol. 45 ›› Issue (9): 20240199.doi: 10.7503/cjcu20240199

• 分析化学 • 上一篇    下一篇

基于GC-MS和机器学习的深静脉血栓形成预测

侯泽金, 李荣其, 李健, 冯怡宁, 靳茜茜, 孙俊红, 曹洁()   

  1. 山西医科大学法医学院, 太原 030001
  • 收稿日期:2024-04-19 出版日期:2024-09-10 发布日期:2024-05-31
  • 通讯作者: 曹洁 E-mail:jie.cao@sxmu.edu.cn
  • 基金资助:
    山西省科技创新人才团队专项(202204051001025)

Prediction of Deep Vein Thrombosis Based on GC-MS and Machine Learning

HOU Zejin, LI Rongqi, LI Jian, FENG Yining, JIN Qianqian, SUN Junhong, CAO Jie()   

  1. School of Forensic Medicine,Shanxi Medical University,Taiyuan 030001,China
  • Received:2024-04-19 Online:2024-09-10 Published:2024-05-31
  • Contact: CAO Jie E-mail:jie.cao@sxmu.edu.cn
  • Supported by:
    the Special Fund for Science and Technology Innovation Teams of Shanxi Province, China(202204051001025)

摘要:

通过研究深静脉血栓形成(DVT)大鼠血液中内源性代谢物的变化规律筛选特征代谢物以此构建疾病预测模型, 用于DVT的临床诊断和法医学鉴定. 采用下腔静脉结扎法构建DVT大鼠模型, 术后72 h采集心血, 通过气相色谱-质谱法(GC-MS)分析得到大鼠血清中的小分子代谢谱. 通过正交偏最小二乘判别分析结合Mann-Whitney U检验初步筛选得到与DVT发生相关的22个差异代谢物, 涉及的体内代谢通路有乙醛酸和二羧酸代谢、 三羧酸循环(TCA循环)以及丙氨酸、 天冬氨酸和谷氨酸代谢. 采用基于随机森林分类算法的特征选择方法(Boruta)从上述差异代谢物中进一步筛选出13个与DVT强相关的特征代谢物集合, 构建了基于不同机器学习算法(逻辑回归、 线性判别分析和Adaboost集成算法)的DVT预测模型. 结果显示, 构建的线性判别分析模型对DVT的预测准确率可达87%, 模型的精度、 召回率、 F1分数和受试者特征曲线下面积(AUROC)分别为0.88, 0.86, 0.87和0.95. 研究表明, 采用GC-MS代谢组学结合机器学习算法构建的DVT预测模型可为DVT的诊断、 治疗以及法医学鉴定提供技术支撑.

关键词: 代谢组学, 深静脉血栓形成, 机器学习, 特征选择, 气相色谱-质谱联用

Abstract:

This study investigated the changes of endogenous metabolites in the serum of deep vein thrombosis(DVT) rats, screened characteristic metabolites related to DVT, and constructed the prediction models for clinical diagnosis and forensic identification of DVT. The DVT rat model was constructed using the inferior vena cava ligation method, and blood samples were collected 72 h post-surgery. Gas chromatography-mass spectrometry(GC-MS) was used to analyze the small molecule metabolism profile in the rat serum. Then, orthogonal partial least squares discriminant analysis combined with the Mann-Whitney U test initially identified 22 differential metabolites associated with DVT, involving metabolic pathways, such as glyoxylate and dicarboxylate metabolism, tricarboxylic acid cycle(TCA cycle), and alanine, aspartate and glutamate metabolism. Subsequently, a feature selection method based on the random forest classification algorithm(Boruta) was applied to screening out 13 characteristic metabolites correlated with DVT from the differential metabolites and predictive models for DVT were constructed using 3 machine learning algorithms(logistic regression, linear discriminant analysis and Adaboost ensemble learning algorithm). The results showed that the DVT prediction model of linear discriminant analysis has a high performance with an accuracy of 87%, as well as precision of 0.88, recall of 0.86, F1 score of 0.87, and area under the receiver operating characteristic curve(AUROC) of 0.95. The research indicates that the DVT prediction model constructed using GC-MS metabolomics combined with machine learning algorithms can provide technical support for the diagnosis, treatment, and forensic identification of DVT.

Key words: Metabolomics, Deep vein thrombosis, Machine learning, Feature selection, GC-MS

中图分类号: 

TrendMD: