Chem. J. Chinese Universities

• 研究论文 • Previous Articles     Next Articles

Multi-KNN-SVR Combinatorial Forecast and Its Application to QSAR of Fluorine-Containing Compounds

TAN Xian-Sheng1,2, YUAN Zhe-Ming1*, ZHOU Tie-Jun2, WANG Chun-Juan1, XIONG Jie-Yi1   

    1. College of Bio-safety Science and Technology,
    2. College of Science, Hunan Agricultural University, Changsha 410128, China
  • Received:2007-03-19 Revised:1900-01-01 Online:2008-01-10 Published:2008-01-10
  • Contact: YUAN Zhe-Ming

Abstract: To further understand the quantitative structure-activity relationship (QSAR) of fluorine-containing pesticide and improve the prediction precision of QSAR models, a novel nonlinear combinatorial forecast me-thod named Multi-KNN-SVR, multi-K-nearest neighbor based on support vector regression, was proposed. The novel method includes the following key steps: firstly, seeking the best kernel automatically based on the minimum mean square error (MSE); secondly, screening descriptors nonlinearly by F-test; finally, carrying out the combinatorial forecast with multiple KNN sub-models. Multi-KNN-SVR was applied to the QSAR for the antibacterial bioactivities of 33 fluorine-containing pesticides against 5 different plant diseases. The results of leave-one-out test show that screening descriptors and sub-models were essential, and the combinatorial forecast after screening sub-models could get a better precision than single KNN model. The predicte results also indicated that Multi-KNN-SVR had the advantages of high prediction precision (MSE=0.005—0.015, MAPE=2.136—3.164), high stability, strong generalization ability, structural risk minimization, non-linear characteristics and avoiding the over-fit in all reference models. Multi-KNN-SVR, therefore, can be widely used in QSAR and other related fields.

Key words: Fluorine-containing compound, Support vector regression, Quantitative stucture-activity relationship(QSAR), K-nearest neighbor, Combinatorial forecast

CLC Number: 

TrendMD: