Abstract:
In order to determine whether a given DNA sequence is a intergenic or a gene region,training features are extracted from DNA sequences based on linguistics method,and gene and intergenic regions of 22
# chromosome are classified with the Support Vector Machine(SVM) technique.The prediction accuracy of classifiers can reach more than 85% without any information in biologic field.Correspondingly,although Binary Logistic Regression(BLR) technique can get also relatively high classification accuracy,the training time of SVM is greatly preferable to BLR's.