深度聚类算法研究综述

    Review of Clustering With Deep Learning

    • 摘要: 聚类分析是挖掘数据内在结构的关键技术,在大数据时代,人们面对的数据通常具有规模大、维度高、结构复杂等特点,直接应用传统聚类算法往往会失效.深度学习凭借层次化非线性映射能力使得大规模深度特征提取成为可能,因此基于深度学习的聚类(深度聚类)算法迅速成为无监督学习领域的研究热点.该文旨在对深度聚类的研究现状进行归纳和总结.首先,从神经网络结构、聚类损失和网络辅助损失3个角度介绍深度聚类的相关概念;然后,根据网络的结构特点对现有的深度聚类算法进行分类,并分别对每类方法的优势和劣势进行分析和阐述;最后,提出好的深度聚类算法应具备的三要素:模型的可扩展性、损失函数的鲁棒性和特征空间的平滑性,并从这3个方面分别阐述未来可能的研究方向.

       

      Abstract: Cluster analysis is a key technology to explore the intrinsic structure of data. In the era of big data, the data we face usually has the characteristics of large scale, high dimensionality, and complex structure. Traditional clustering algorithms often fail to process such data. Deep learning makes large-scale deep feature extraction possible with powerful hierarchical nonlinear mapping capabilities. Therefore, clustering algorithms based on deep learning (deep clustering) have quickly become a research focus in unsupervised learning. This paper aims to summarize the research status of deep clustering. First, the related concepts of deep clustering was introduced from the perspectives of neural network structure, clustering loss and ancillary network loss. Then the existing deep clustering algorithms were classified according to the structural characteristics of the network, and the advantages and disadvantages of each type of methods were analyzed. Finally, three elements of a good deep clustering algorithm were proposed, namely scalability of the model, robustness of the loss function and smoothness of the feature space, and the possible future research opportunities from each of these three aspects were illustrated.

       

    /

    返回文章
    返回