Chem. J. Chinese Universities ›› 2024, Vol. 45 ›› Issue (9): 20240199.doi: 10.7503/cjcu20240199

• Analytical Chemistry • Previous Articles     Next Articles

Prediction of Deep Vein Thrombosis Based on GC-MS and Machine Learning

HOU Zejin, LI Rongqi, LI Jian, FENG Yining, JIN Qianqian, SUN Junhong, CAO Jie()   

  1. School of Forensic Medicine,Shanxi Medical University,Taiyuan 030001,China
  • Received:2024-04-19 Online:2024-09-10 Published:2024-05-31
  • Contact: CAO Jie E-mail:jie.cao@sxmu.edu.cn
  • Supported by:
    the Special Fund for Science and Technology Innovation Teams of Shanxi Province, China(202204051001025)

Abstract:

This study investigated the changes of endogenous metabolites in the serum of deep vein thrombosis(DVT) rats, screened characteristic metabolites related to DVT, and constructed the prediction models for clinical diagnosis and forensic identification of DVT. The DVT rat model was constructed using the inferior vena cava ligation method, and blood samples were collected 72 h post-surgery. Gas chromatography-mass spectrometry(GC-MS) was used to analyze the small molecule metabolism profile in the rat serum. Then, orthogonal partial least squares discriminant analysis combined with the Mann-Whitney U test initially identified 22 differential metabolites associated with DVT, involving metabolic pathways, such as glyoxylate and dicarboxylate metabolism, tricarboxylic acid cycle(TCA cycle), and alanine, aspartate and glutamate metabolism. Subsequently, a feature selection method based on the random forest classification algorithm(Boruta) was applied to screening out 13 characteristic metabolites correlated with DVT from the differential metabolites and predictive models for DVT were constructed using 3 machine learning algorithms(logistic regression, linear discriminant analysis and Adaboost ensemble learning algorithm). The results showed that the DVT prediction model of linear discriminant analysis has a high performance with an accuracy of 87%, as well as precision of 0.88, recall of 0.86, F1 score of 0.87, and area under the receiver operating characteristic curve(AUROC) of 0.95. The research indicates that the DVT prediction model constructed using GC-MS metabolomics combined with machine learning algorithms can provide technical support for the diagnosis, treatment, and forensic identification of DVT.

Key words: Metabolomics, Deep vein thrombosis, Machine learning, Feature selection, GC-MS

CLC Number: 

TrendMD: