Abstract:
Deep learning technology shows excellent performance in a variety of visual tasks. In particular, the development of deep learning technology has significantly promoted the progress of fine-grained image recognition tasks. The purpose of fine-grained image recognition is to correctly recognize sub-object categories, such as different sub categories in birds. Because fine-grained image data usually need expert knowledge to effectively identify and annotate, it is difficult to obtain. At the same time, because fine-grained categories directly have small inter-class differences and large intra-class differences, the model needs to be able to capture subtle distinguishing local features. The two reasons make this task very challenging. This paper first introduced the important development process of deep learning technology, the characteristics and challenges of fine-grained image recognition tasks, then introduced three types of fine-grained recognition methods based on deep learning, including methods by localization-classification subnetworks, and by end-to-end feature encoding methods and the fine-grained image recognition methods using external auxiliary information, and selected representative works to give a detailed introduction. Finally, the performance of related work was compared on commonly used data sets, and the task of fine-grained image recognition was summarized and prospected.