Chem. J. Chinese Universities

• 研究论文 • Previous Articles     Next Articles

Identification of P-gp Substrates Using a Random Forest Method Based on Chemistry Development Kit Descriptors

MA Guang-Li1, ZHAO Xiao-Ping2, CHENG Yi-Yu1*   

    1. Pharmaceutical Informatics Institute, Zhejiang University, Hangzhou 310027, China;
    2. Zhejiang Chinese Medical University, Hangzhou 310053, China
  • Received:2007-01-24 Revised:1900-01-01 Online:2007-10-10 Published:2007-10-10
  • Contact: CHENG Yi-Yu

Abstract: A model to identify P-glycoprotein(P-gp) substrate was constructed with a random forest method based on open source software CDK(Chemistry Development Kit) descriptors and a training data set which contained 170 compounds(96 P-gp substrates). The study on the relationship between CDK descriptors and P-gp substrates indicates that sum of the atomic polarizabilities and charged partial surface area play important roles in identifying P-gp substrates. An external test data set containing 42 compounds(24 P-gp substrates) was employed. The correct classification rate on the training set is 99.42% and the correct classification rates for P-gp substrates, non-substrates and the total compounds on the test set are 87.50%, 83.33% and 85.71%, respectively. Leave-One-Out cross-validation correct classification rate(212 compounds) was 77.4%.

Key words: P-glycoprotein(P-gp), Random forest, Pattern recognition

CLC Number: 

TrendMD: