Abstract:
Cluster analysis is a key technology to explore the intrinsic structure of data. In the era of big data, the data we face usually has the characteristics of large scale, high dimensionality, and complex structure. Traditional clustering algorithms often fail to process such data. Deep learning makes large-scale deep feature extraction possible with powerful hierarchical nonlinear mapping capabilities. Therefore, clustering algorithms based on deep learning (deep clustering) have quickly become a research focus in unsupervised learning. This paper aims to summarize the research status of deep clustering. First, the related concepts of deep clustering was introduced from the perspectives of neural network structure, clustering loss and ancillary network loss. Then the existing deep clustering algorithms were classified according to the structural characteristics of the network, and the advantages and disadvantages of each type of methods were analyzed. Finally, three elements of a good deep clustering algorithm were proposed, namely scalability of the model, robustness of the loss function and smoothness of the feature space, and the possible future research opportunities from each of these three aspects were illustrated.