基于SVM的人类基因序列分类方法

    Classification Method Based on SVM for Human Gene Sequences

    • 摘要: 为了判断一个给定的DNA序列片段是基因序列还是间区序列,基于语言学方法提取了DNA序列特征,通过支持向量机(SVM)训练方法,实现了对人类22号染色体的DNA序列中的基因和基因间区序列的分类.在不依赖于任何生物领域知识的前提下,该方法能得到85%以上的分类精度.相对于SVM分类方法,虽然二元Logistic回归(BLR)方法也能达到较高的分类精度,但在训练时间上SVM方法远优于BLR方法.

       

      Abstract: In order to determine whether a given DNA sequence is a intergenic or a gene region,training features are extracted from DNA sequences based on linguistics method,and gene and intergenic regions of 22# chromosome are classified with the Support Vector Machine(SVM) technique.The prediction accuracy of classifiers can reach more than 85% without any information in biologic field.Correspondingly,although Binary Logistic Regression(BLR) technique can get also relatively high classification accuracy,the training time of SVM is greatly preferable to BLR's.

       

    /

    返回文章
    返回