利用隐马尔科夫模型识别蛋白质折叠类型

    Protein Fold Recognition Using Hidden Markov Model

    • 摘要: 以70种蛋白质折叠为研究对象,对每种折叠,选择序列同一性小于25%、样本量大于3的代表性蛋白质为训练集,采用机器和人工结合的办法进行结构比对,产生序列排比,经过训练得到了适合每种折叠的概形隐马尔科夫模型(profile HMM)用于该折叠类型的识别.对Astral1.65中的9 505个蛋白质结构域样本进行单模型识别,平均敏感性和特异性分别为91.93%和99.95%,Matthew相关系数为0.87.在折叠类型水平上,与Pfam和SUPERFAMILY单纯使用序列比对构建的HMM相比,所用模型数量显著减少,仍然保持很高的识别效果.结果表明:对序列相似度很低但具有相同折叠类型的蛋白质,可以通过引入结构比对的方法建立统一的HMM模型,实现高准确率的折叠类型识别.

       

      Abstract: Based on the classification of SCOP,we chose 70 folding types.Each type consists of a subset of proteins(<25% sequence identity) which have more than 4 samples.These sequences were aligned by structure alignment tool combining with manual inspection,and the sequence alignment result was used to generate a profile HMM of each fold.In the single model identify test on 9 505 sequences of Astral-1.65,the sensitivity and specificity of the profile HMM reach to 91.93% and 99.95% respectively,and the Matthew correlation coefficient is 0.87.Compared with Pfam and SUPERFAMILY which construct HMM based on merely sequence alignment,the model number is significantly reduced,while keeping the sensitivity at the same level.The result show that,for those proteins with same fold type but low sequence identity,a unified HMM can be constructed by introducing structure alignment to implement fold identify with high accuracy.

       

    /

    返回文章
    返回