Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey

CUI Zheng; HU Yongli; SUN Yanfeng; YIN Baocai

doi:10.11936/bjutxb2021040030

CUI Zheng, HU Yongli, SUN Yanfeng, YIN Baocai. Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey[J]. Journal of Beijing University of Technology, 2022, 48(10): 1088-1099. DOI: 10.11936/bjutxb2021040030

Citation:

Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Collaborative analysis and processing of cross-modal data are always difficult and hot topics in the field of modern artificial intelligence. The main challenge is the semantic and heterogeneous gap of cross-modal data. Recently, with the rapid development of deep learning theory and technology, algorithms based on deep learning have made great progress in the field of image and text processing, and then the research topic of visual question answering (VQA) has emerged. VQA system uses visual information and text questions as input to get corresponding answers. The core of the system is to understand and process visual and text information cooperatively. Therefore, VQA methods were reviewed in detail. According to the principle of methods, the existing VQA methods were divided into three categories including data fusion, cross-modal attention and knowledge reasoning. The latest development of VQA methods was comprehensively summarized and analyzed, the commonly used VQA data sets were introduced and prospects for future research direction were suggested.

FullText(HTML)

References (69)

Cited By

Turn off MathJax

Article Contents

Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content