高等学校化学学报 ›› 2009, Vol. 30 ›› Issue (7): 1309.

• 研究论文 • 上一篇    下一篇

基于环境的编码方法在预测HLA-A*0201结合多肽中的应用

赵璞, 李通化   

  1. 同济大学化学系, 上海 200092
  • 收稿日期:2008-10-13 出版日期:2009-07-10 发布日期:2009-07-10
  • 通讯作者: 李通化, 男, 教授, 博士生导师, 主要从事化学计量学研究. E-mail: lith@tongji.edu.cn
  • 基金资助:

    国家自然科学基金(批准号: 20675057, 20705024)资助.

Prediction of HLA-A*0201 Binding Peptides Using Binding-environment-based Peptide Representation

ZHAO Pu, LI Tong-Hua*   

  1. Department of Chemistry, Tongji University, Shanghai 200092, China
  • Received:2008-10-13 Online:2009-07-10 Published:2009-07-10
  • Contact: LI Tong-Hua. E-mail: lith@tongji.edu.cn
  • Supported by:

    国家自然科学基金(批准号: 20675057, 20705024)资助.

摘要:

T淋巴细胞对抗原的识别是产生与调节有效免疫应答的关键, T细胞只识别主要组织相容性复合物(MHC)呈递上来的抗原, 因此MHC与抗原多肽的结合就成为一系列免疫应答过程中基础的一环. 为了辅助疫苗设计, 多种机器学习技术已被普遍应用于MHC结合多肽的预报领域中. 本文以支持向量机(SVM)为手段, 以HLA-A*0201的实验数据集为对象, 对多种肽段编码方法形成的模型进行评价, 得到的AUC值的范围在0.932~0.936之间. 提出一种新的利用抗原多肽结合环境的编码方法, 使预报的AUC值提高到0.953. 对独立数据集进行建模预报, 同样证明环境编码模型的预报准确率高于传统编码方法的准确率.

关键词: HLA-A*0201; 结合多肽预测; 支持向量机(SVM); 编码; 受试者工作曲线(ROC)

Abstract:

In all vertebrates, there is a large genomic region or gene family that has a major influence on graft survival referred to as the Major Histocompatibility Complex(MHC). T cells only recognize antigens as a complex with MHC molecules. Therefore MHC binding peptides prediction is an important step in T cells epitopes discovery. To facilitate vaccine design, computational methods have been developed for predicting MHC-binding peptides. A large variety of machine-learning techniques are commonly used in this field. This work explored Support Vector Machine(SVM) as such a method for developing prediction systems of HLA-A*0201 by using experiment dataset. Data representations play a key role in SVM models, so we examined different types of inputs variables for predicting HLA-binding peptides. The AUCs of these SVM models were 0.932—0.936. Then this work proposed a new way to encode peptides, which uses the information of peptides′ binding environment, and achieved an impressive AUC of 0.953. The results of independent dataset prediction showed that the overall performance of our novel environmental encoding based SVM models is improved in comparison to other traditional encodings.

Key words: HLA-A*0201; Binding peptides prediction; Support Vector Machine(SVM); Data representations; Receiver Operating Characteristic(ROC)

TrendMD: