Chem. J. Chinese Universities ›› 2025, Vol. 46 ›› Issue (3): 20240373.doi: 10.7503/cjcu20240373
• Analytical Chemistry • Previous Articles Next Articles
LUAN Yue, KONG Dingling, GUO Lili, ZHANG Qingyou(), ZHOU Yanmei(
)
Received:
2024-07-30
Online:
2025-03-10
Published:
2024-09-12
Contact:
ZHANG Qingyou, ZHOU Yanmei
E-mail:qingyou@vip.henu.edu.cn;zhouym@henu.edu.cn
Supported by:
CLC Number:
TrendMD:
LUAN Yue, KONG Dingling, GUO Lili, ZHANG Qingyou, ZHOU Yanmei. Prediction of Chemical Bond Dissociation Energies of Small Organic Molecules Based on Random Forest[J]. Chem. J. Chinese Universities, 2025, 46(3): 20240373.
Descriptor number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Layer number | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 4 | 4 | 4 | 5 | 5 | 5 |
Corresponding heteroatom | O | N | S | O | N | S | O | N | S | O | N | S | O | N | S |
HC descriptors | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
Table 1 Heteroatomic Count descriptors of the molecule in Fig.2
Descriptor number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Layer number | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 4 | 4 | 4 | 5 | 5 | 5 |
Corresponding heteroatom | O | N | S | O | N | S | O | N | S | O | N | S | O | N | S |
HC descriptors | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
Descriptors/Number of descriptors | Training set(cross⁃validation) R2 /MAE/RMSE/(kcal‧mol-1)* | Test set R2 /MAE/RMSE/(kcal‧mol-1) |
---|---|---|
HC+Branch/38 | 0.6860/7.68/11.79 | 0.7904/8.89/13.03 |
MAT/360 | 0.8675/5.82/9.72 | 0.8755/5.99/10.18 |
HC+Branch+MAT/398 | 0.8874/5.45/8.96 | 0.8709/6.34/10.27 |
MBT/160 | 0.8719/5.73/9.54 | 0.8699/6.22/10.31 |
HC+Branch+MBT/198 | 0.8891/5.34/8.88 | 0.8915/5.86/9.48 |
MAT+MBT/520 | 0.8894/5.22/8.88 | 0.8935/5.38/9.35 |
HAB/550 | 0.8977/4.96/8.53 | 0.8980/5.28/9.17 |
Branch+HAB/558 | 0.8963/5.13/8.60 | 0.8920/5.63/9.41 |
Table 2 BDE prediction results based on different descriptors
Descriptors/Number of descriptors | Training set(cross⁃validation) R2 /MAE/RMSE/(kcal‧mol-1)* | Test set R2 /MAE/RMSE/(kcal‧mol-1) |
---|---|---|
HC+Branch/38 | 0.6860/7.68/11.79 | 0.7904/8.89/13.03 |
MAT/360 | 0.8675/5.82/9.72 | 0.8755/5.99/10.18 |
HC+Branch+MAT/398 | 0.8874/5.45/8.96 | 0.8709/6.34/10.27 |
MBT/160 | 0.8719/5.73/9.54 | 0.8699/6.22/10.31 |
HC+Branch+MBT/198 | 0.8891/5.34/8.88 | 0.8915/5.86/9.48 |
MAT+MBT/520 | 0.8894/5.22/8.88 | 0.8935/5.38/9.35 |
HAB/550 | 0.8977/4.96/8.53 | 0.8980/5.28/9.17 |
Branch+HAB/558 | 0.8963/5.13/8.60 | 0.8920/5.63/9.41 |
Descriptors/Number of descriptors/Number of layers | Training set(cross⁃validation) R2/MAE/RMSE/(kcal‧mol-1) | Test set R2/MAE/RMSE/(kcal‧mol-1) |
---|---|---|
MAT+MBT/352/5 | 0.8886/5.24/8.91 | 0.8919/5.42/9.43 |
MAT+MBT/403/6 | 0.8895/5.24/8.91 | 0.8903/5.41/9.48 |
MAT+MBT/449/7 | 0.8890/5.26/8.88 | 0.8891/5.50/9.55 |
HAB/379/5 | 0.8988/4.95/8.48 | 0.8977/5.31/9.17 |
HAB/434/6 | 0.8992/4.97/8.47 | 0.8956/5.34/9.28 |
HAB/484/7 | 0.8987/4.97/8.49 | 0.8957/5.33/9.26 |
HAB+Branch/387/5 | 0.8973/5.08/8.56 | 0.8930/5.57/9.36 |
HAB+Branch/444/6 | 0.8955/5.19/8.63 | 0.8905/5.65/9.45 |
HAB+Branch/496/7 | 0.8977/5.16/8.55 | 0.8918/5.63/9.42 |
Table 3 Prediction results of different description layers BDE based on RF
Descriptors/Number of descriptors/Number of layers | Training set(cross⁃validation) R2/MAE/RMSE/(kcal‧mol-1) | Test set R2/MAE/RMSE/(kcal‧mol-1) |
---|---|---|
MAT+MBT/352/5 | 0.8886/5.24/8.91 | 0.8919/5.42/9.43 |
MAT+MBT/403/6 | 0.8895/5.24/8.91 | 0.8903/5.41/9.48 |
MAT+MBT/449/7 | 0.8890/5.26/8.88 | 0.8891/5.50/9.55 |
HAB/379/5 | 0.8988/4.95/8.48 | 0.8977/5.31/9.17 |
HAB/434/6 | 0.8992/4.97/8.47 | 0.8956/5.34/9.28 |
HAB/484/7 | 0.8987/4.97/8.49 | 0.8957/5.33/9.26 |
HAB+Branch/387/5 | 0.8973/5.08/8.56 | 0.8930/5.57/9.36 |
HAB+Branch/444/6 | 0.8955/5.19/8.63 | 0.8905/5.65/9.45 |
HAB+Branch/496/7 | 0.8977/5.16/8.55 | 0.8918/5.63/9.42 |
Proximity | > 0.9 | 0.9—0.7 | 0.7—0.5 | 0.5—0.3 | 0.3—0 |
---|---|---|---|---|---|
(5 layers)MAE(Number of compounds) | 2.77(46) | 3.65(49) | 5.31(78) | 5.81(53) | 8.83(16) |
(6 layers)MAE(Number of compounds) | 2.44(28) | 4.75(60) | 4.00(82) | 6.00(56) | 7.52(16) |
Table 4 Error under different proximities
Proximity | > 0.9 | 0.9—0.7 | 0.7—0.5 | 0.5—0.3 | 0.3—0 |
---|---|---|---|---|---|
(5 layers)MAE(Number of compounds) | 2.77(46) | 3.65(49) | 5.31(78) | 5.81(53) | 8.83(16) |
(6 layers)MAE(Number of compounds) | 2.44(28) | 4.75(60) | 4.00(82) | 6.00(56) | 7.52(16) |
Chemical bond type (Number of compounds) | R2 | Training set(cross⁃validation) MAE/RMSE/(kcal‧mol-1) | Test set MAE/RMSE/(kcal‧mol-1) |
---|---|---|---|
O—O(93) | 0.6398/0.4617 | 1.95/3.47 | 2.00/4.24 |
O—N(127) | 0.7211/0.8494 | 5.16/9.96 | 6.45/8.74 |
O—S(23) | 0.2616/0.9193 | 11.32/13.80 | 13.55/16.37 |
O—C(216) | 0.8946/0.7837 | 5.08/7.68 | 6.03/12.50 |
N—N(91) | 0.6582/0.6442 | 3.78/7.39 | 5.28/8.70 |
N—S(14) | 0.3065/0.2564 | 5.46/9.22 | 4.66/4.91 |
N—C(142) | 0.8498/0.8426 | 5.72/8.33 | 6.57/9.26 |
S—S(16) | 0.2072/0.6208 | 11.03/13.60 | 12.72/13.56 |
S—C(90) | 0.7891/0.6365 | 4.22/7.48 | 5.09/9.96 |
C—C(396) | 0.7623/0.7979 | 4.97/8.98 | 5.07/8.09 |
Table 5 Results of BDE for different bonds based on RF
Chemical bond type (Number of compounds) | R2 | Training set(cross⁃validation) MAE/RMSE/(kcal‧mol-1) | Test set MAE/RMSE/(kcal‧mol-1) |
---|---|---|---|
O—O(93) | 0.6398/0.4617 | 1.95/3.47 | 2.00/4.24 |
O—N(127) | 0.7211/0.8494 | 5.16/9.96 | 6.45/8.74 |
O—S(23) | 0.2616/0.9193 | 11.32/13.80 | 13.55/16.37 |
O—C(216) | 0.8946/0.7837 | 5.08/7.68 | 6.03/12.50 |
N—N(91) | 0.6582/0.6442 | 3.78/7.39 | 5.28/8.70 |
N—S(14) | 0.3065/0.2564 | 5.46/9.22 | 4.66/4.91 |
N—C(142) | 0.8498/0.8426 | 5.72/8.33 | 6.57/9.26 |
S—S(16) | 0.2072/0.6208 | 11.03/13.60 | 12.72/13.56 |
S—C(90) | 0.7891/0.6365 | 4.22/7.48 | 5.09/9.96 |
C—C(396) | 0.7623/0.7979 | 4.97/8.98 | 5.07/8.09 |
1 | Liu Y., Li Y., Yang Q., Yang J., Zhang L., Luo S., Chin. J. Chem., 2024, 42(17), 1967—1974 |
2 | Wang P., Gong S., Mo Y., J. Chem. Phys., 2024, 160(16), 164302 |
3 | Nicolaides A., Tomioka H., J. Phys. Org. Chem., 2024, 37(6), e4606 |
4 | Nakajima M., Nemoto T., Sci. Rep., 2021, 11(1), 20207 |
5 | Wen M., Blau S. M., Spotte⁃Smith E. W. C., Dwaraknath S., Persson K. A., Chem. Sci., 2020, 12(5), 1858—1868 |
6 | S. V S. S., Kim Y., Kim S., St. John P. C., Paton R. S., Digit Discov., 2023, 2(6), 1900—1910 |
7 | Meng Q., Wang R., Shao H., Wang Y., Wen X., Yao C., Qiao J., J. Phys. Chem. Lett., 2024, 15(16), 4422—4429 |
8 | Gou Q., Liu J., Su H., Guo Y., Chen J., Zhao X., Pu X., iScience, 2024, 27(4), 109452 |
9 | Raza A., Bardhan S., Xu L., Yamijala S., Lian C., Kwon H., Wong B., Environ. Sci. Tech. Lett., 2019, 6(10), 624—629 |
10 | Yu H., Wang Y., Wang X., Zhang J., Ye S., Huang Y., Luo Y., Sharman E., Chen S., Jiang J., J. Phys. Chem. A, 2020, 124(19), 3844—3850 |
11 | Bao J., Welch B. K., Ulusoy I. S., Zhang X., Xu X., Wilson A. K., Truhlar D. G., J. Phys. Chem. A, 2020, 124(47), 9757—9770 |
12 | Qu X., Latino D. A., Aires⁃de⁃Sousa J., J. Cheminform., 2013, 5(1), 34 |
13 | Feng C., Sharman E., Ye S., Luo Y., Jiang J., Sci. China Chem., 2019, 62(12), 1698—1703 |
14 | Li W., Luan Y., Zhang Q., Aires⁃de⁃Sousa J., Mol. Inform., 2023, 42(1), e2200193 |
15 | An H., Liu X., Cai W., Shao X., J. Chem. Inf. Model, 2024, 64(14), 5480—5491 |
16 | Liu J., He X., Xiong Y., Nie F., Zhang C., Def. Technol., 2023, 22, 144—155 |
17 | Mantero A., Ishwaran H., Stat. Anal. Data Min., 2021, 14(2), 144—167 |
18 | Scornet E., J. Multivar. Anal., 2016, 146, 72—83 |
19 | Wesolowski B. C., J. Educ. Meas., 2019, 56(3), 610—625 |
20 | Bian X., Li S., Fan M., Guo Y., Chang N., Wang J., Anal. Methods, 2016, 8(23), 4674—4679 |
21 | Kong D., Luan Y., Zhao X., Lu Y., Li W., Zhang Q., Pang A., Chemometr. Intell. Lab., 2023, 243, 105021 |
22 | Frank E., Hall M., Trigg L., Holmes G., Witten I. H., Bioinformatics, 2004, 20(15), 2479—2481 |
23 | Shao X., Bian X., Liu J., Zhang M., Cai W., Anal. Methods, 2010, 2(11), 1662—1666 |
24 | Kaneko H., ACS Omega, 2023, 8(24), 21781—21786 |
25 | Li X., Luan Y., Lu Y., Li W., Ma L., Zhang Q., Pang A., Chem. Res. Chinese Universities, 2022, 39(2), 296—304 |
26 | Marque S., J. Org. Chem., 2003, 68(20), 7582—7590 |
27 | Huang C., Zhao Y., Roy I., Cai L., Pitsch H., Leonhard K., Combust Flame, 2022, 242, 112211 |
28 | Gramatica P., Qsar. Comb. Sci., 2007, 26(5), 694—701 |
29 | Netzeva T. I., Gallegos Saliner A., Worth A. P., Environ. Toxicol. Chem., 2006, 25(5), 1223—1230 |
30 | Wang Z., Chen J., Hong H., Chem. Res. Toxicol., 2020, 33(6), 1382—1388 |
31 | Roy K., Kar S., Ambure P., Chemometr. Intell. Lab., 2015, 145, 22—29 |
32 | Wang Z., Chen J., Hong H., Environ. Sci. Technol., 2021, 55(10), 6857—6866 |
33 | St John P. C., Guan Y., Kim Y., Kim S., Paton R. S., Nat. Commun., 2020, 11(1), 2328 |
34 | Luan Y., Li X., Kong D., Li W., Li W., Zhang Q., Pang A., J. Mol. Graph. Model, 2024, 129, 108752 |
35 | Wu T., Chen M. Y., Xiao K. X., Zhou Y. M., Zhang Q. Y., Chem. J. Chinese Universities, 2019, 40(6), 1158—1163 |
吴婷, 陈梦瑶, 肖凯霞, 周艳梅, 张庆友. 高等学校化学学报, 2019, 40(6), 1158—1163 |
[1] | ZHANG Yan, JIANG Xingjian, LIU Ming, ZHENG Zhi, ZHANG Yong. Predict Efficiency of Organic Solar Cell with Low Generalization Error Based on Molecular Property and Device Fabrication [J]. Chem. J. Chinese Universities, 2023, 44(7): 20230165. |
[2] | WEI Manman,LU Haoxiang,YANG Huihua. Research on Boold Species Ide.pngication Algorithm Based on RF_AdaBoost Model † [J]. Chem. J. Chinese Universities, 2020, 41(1): 94. |
[3] | LI Hong-Zhi, TAO Wei, GAO Ting, LI Hui, LV Ying-Hua, SU Zhong-Min. Improving the Accuracy of DFT Calculation for Homolysis Bond Dissociation Energies of Y—NO Bond via Back Propagation Neural Network Based on Mean Impact Value [J]. Chem. J. Chinese Universities, 2012, 33(02): 346. |
[4] | JIAO Xiao-Yun, Zhang Ming-Tian, ZHU Xiao-Qing, CHENG Jin-Pei*. Comparison of the Reactivity Between 1,4- and 1,2-Dihydro NADH Models [J]. Chem. J. Chinese Universities, 2008, 29(6): 1145. |
[5] | MA Guang-Li1, ZHAO Xiao-Ping2, CHENG Yi-Yu1*. Identification of P-gp Substrates Using a Random Forest Method Based on Chemistry Development Kit Descriptors [J]. Chem. J. Chinese Universities, 2007, 28(10): 1885. |
[6] | LI Ji-Hai, GAO Jian-Jun, FENG Da-Cheng, FENG Sheng-Yu. Theoretical Study of the Complexes of First-row Transition Metals with SiH2 [J]. Chem. J. Chinese Universities, 2001, 22(2): 252. |
[7] | LI Ji-Hai, FENG Sheng-Yu, GAO Jian-Jun, LIU Shao-Jie . The Theoretical Studies of Interaction of Transition metals with Silylene Ligands(Ⅰ) -- Ab Initio Study of MSiH2+ [J]. Chem. J. Chinese Universities, 1999, 20(12): 1906. |
[8] | LI Ji-Hai, FENG Da-Cheng, FENG Sheng-Yu . Ab Initio Study on Transition Metal Carbene Cations [J]. Chem. J. Chinese Universities, 1998, 19(9): 1495. |
[9] | CHENG Jin-Pei, LIU Bo, LU Yun, MI Jiang-Lin, HUAN Zhen-Wei. Homolytic and Heterolytic C-H Bond Dissociation Energies of a-V (or Ⅵ)-Group Cation-substituted Toluenes and Acetophenones [J]. Chem. J. Chinese Universities, 1997, 18(7): 1081. |
[10] | HAN Chang-ri . Research on the Group Electronegativity(Ⅹ)——Group Electronegativity and Heats of Formation of X—SiRxH3-x [J]. Chem. J. Chinese Universities, 1992, 13(3): 392. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||