Abstract:
In order to solve the problem that Traditional Hierarchical Agglomerative Clustering Algorithms (HACA) may produce a nonunique binary tree as the clustering result of a same dataset, this paper presents Hierarchical Subtrees Agglomerative Clustering Algorithm (HSACA), the basic idea of which is to find maximal
θ-distant subtrees in a minimal spanning tree of the data set and merge its vertex set. HSACA can merge many objects into a cluster in each step, and its clustering result is usually a multiple tree. This paper proves in theory that the multiple tree generated by HSACA is unique for a dataset without considering the branchy orders, and shows in computer simulations that the multiple tree describes a more reasonable clustering result than the binary tree generated by traditional HACA if there are many equidistant pairs of points in the data set.