融合多数据源的蛋白质功能模块的挖掘算法
Bipartite Graph-based Integrative Method to Detect Consistent Protein Functional Modules from Multiple Sources
-
摘要: 针对蛋白质相互作用(protein-protein interaction,PPI)网络的信息不完善和高噪声问题,提出一种融合多生物数据的二分图聚类集成方法以检测网络中的功能模块.该方法结合了基因本体论(gene ontology,GO)、基因表达谱数据以及多种基础聚类算法,用一种新的二分图来组织多种基础聚类算法的中间结果,并结合对称非负矩阵分解(non-negative matrix factorization,NMF)算法挖掘其中功能意义上最一致蛋白质功能模块,同时,该算法能处理蛋白质功能重叠问题.实验结果表明:所提算法整体优于基准比较方法,是一种融合多种生物信息源和不同的聚类方法的有效途径.Abstract: A bipartite graph-based cluster ensemble method that integrates gene ontology(GO) and gene expression data with protein-protein interaction(PPI) networks is proposed. In this method,all different views of biological information and three basic clustering methods are contributed to a bipartite graph that comprehensively represents the relationships between the objects in this problem,including the proteins and the meta-clusters from the basic cluster methods. Furthermore,consistent modules are extracted using a symmetric non-negative matrix factorization(NMF)-based graph partition method and overlapping results are achieved. Extensive experimental results show that this method is superior to the baseline methods; further analysis is addressed to discuss the benefits of integrating multiple biological information sources and diverse clustering methods.