基于统计方法的肿瘤特征基因提取
Cancer Informative Gene Identification Based on Statistical Method
-
摘要: 为了找出肿瘤特征基因,将14种不同组织类型的肿瘤作为一个整体.分析其与对应的正常组织样本间基因表达的差异,提取出反映样本类别特征的特征基因,为生物医学研究中分析基因表达数据提供参考.首先利用相关系数,在一定范围内排除噪声基因,然后采用质心收缩法提取出能够反映样本组织类型的特征基因.提取出的特征基因对样本聚类的正确率为87.9%,对测试集样本分类的正确率为81.1%,优于特征基因提取前的聚类和分类结果.Abstract: In order to discover informative gene of cancer, a view to regard different tumors as a single class was pesented in this paper. The purpose is to find informative genes that can classify tumor tissues from normal tissues, which can be used for gene expression research of biomedicine and biotechnology. We used the correlation coefficient for each gene as the criterion for classification, and remove the ‘noise-genes’ with smaller correlation coefficient values. A statistical method called ‘nearest shrunken centroids’ is applied in order to find informative gene with good ability of classifying and clustering the samples corresponding to their tissues types. We correctly clustered 87.7% samples and classify the testing samples with an accuracy of 81.1% using the informative genes. The results show that both the performance of clustering and classifying are improved after the feature selection.