基于最小树切割的自适应聚类方法
Adaptive Clustering Algorithm Based on Minimal Spanning Tree Cutting
-
摘要: 为了简单有效地对数据集进行结构分析,提出了一种基于最小树进行聚类的算法(MSTCA).其基本思想是在最小树中切割所有大于一定阈值的边,对数据集进行子类划分,同时对较小的子类进行合并.MSTCA产生的聚类结果在不考虑子类次序时是唯一的。对它的递归调用还可在若干不同粒度层次上形成数据集的聚类结构.计算实验表明,MSTCA不仅能为具有各种不同聚类形状的数据集自适应地选择较好的聚类个数,而且只需简单的参数选择就能准确地分析出数据中存在的合理聚类和例外样本.Abstract: In order to analyze the structure of a dataset simply and efficiently,this paper proposes a new clustering algorithm based on minimal spanning tree:MSTCA.The basic idea of which is to partition a data set into subclasses by cutting all edges whose lengths are greater than a certain threshold in one of its minimal spanning tree,and to merge those relatively small subclasses at the same time.MSTCA can guarantee a unique clustering result without considering the order of subclasses,and the recursive call to it can generate a hierarchical structure with clusters in some different levels.Computing experiments show that MSTCA can adaptively choose the good number of clusters for a data set with clusters of various shapes and often accurately detect reasonable clusters and outliers in a data set requiring only simple selection of parameters.