高等学校化学学报 ›› 2025, Vol. 46 ›› Issue (4): 20240556.doi: 10.7503/cjcu20240556

• 高分子化学 • 上一篇    

基于分子指纹与量子化学描述符预测聚酰亚胺玻璃化转变温度的机器学习模型

詹森华, 石彤非()   

  1. 广东工业大学轻工化工学院,广州 510006
  • 收稿日期:2024-12-23 出版日期:2025-04-10 发布日期:2025-01-15
  • 通讯作者: 石彤非 E-mail:tfshi@gdut.edu.cn
  • 基金资助:
    国家自然科学基金(2247030172)

Machine Learning Model for Predicting the Glass Transition Temperature of Polyimides Based on Molecular Fingerprints and Quantum Chemical Descriptors

ZHAN Senhua, SHI Tongfei()   

  1. School of Chemical Engineering and Light Industry,Guangdong University of Technology,Guangzhou 510006,China
  • Received:2024-12-23 Online:2025-04-10 Published:2025-01-15
  • Contact: SHI Tongfei E-mail:tfshi@gdut.edu.cn
  • Supported by:
    the National Natural Science Foundation of China(2247030172)

摘要:

基于聚酰亚胺重复单元获得了分子访问系统(MACCS)指纹图谱和9种量子化学密度泛函理论(DFT)描述符, 构建了MACCS, DFT和两者集成的3类预测模型. 通过比较分析随机森林(RF)、 支持向量回归(SVR)、 极致梯度提升(XGB)和梯度提升回归(GBR)等4种机器学习算法共12个机器学习模型来预测聚酰亚胺的玻璃化转变温度, 并提取关键特征信息. 结果表明, 最优的玻璃化转变温度预测模型是XGB集成模型, 其训练集和测试集的决定系数(R2)分别为0.956和0.811, 测试集的均方根误差(RMSE)和平均绝对误差(MAE)分别为25.41和20.20. 此外, 集成MACCS指纹和DFT的模型均比单一模型的效果好. 建立的集成模型框架可为聚酰亚胺材料及聚合物材料结构的设计提供参考.

关键词: 机器学习, 量子化学, 分子指纹, 聚酰亚胺

Abstract:

Combining machine learning and quantum chemistry methods to construct predictive models can facilitate the design and screening of polyimide material structures. In this study, Molecular ACCess System(MACCS) fingerprints and nine density functional theory(DFT) quantum chemical descriptors were obtained from polyimide repeating units to construct three types of predictive models: MACCS, DFT and their integrated models. Twelve machine learning models were developed using four algorithms——random forest(RF), support vector regression (SVR), extreme gradient boosting(XGB) and gradient boosting regression(GBR)——to predict the glass transition temperature of polyimides and extract key feature information. The results showed that the optimal predictive model for the glass transition temperature is the integrated XGBoost model, with coefficient of determination(R²) values of 0.956 and 0.811 for the training and test sets, respectively. The root mean square error(RMSE) and mean absolute error(MAE) for the test set are 25.41 and 20.20, respectively. Furthermore, the integrated MACCS fingerprint and DFT models performed better than the individual models. The established integrated model framework provides new insights for the structural design of polyimide materials and other polymer materials.

Key words: Machine learning, Quantum chemistry, Molecular fingerprint, Polyimide

中图分类号: 

TrendMD: