CUI Zheng, HU Yongli, SUN Yanfeng, YIN Baocai. Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey[J]. Journal of Beijing University of Technology, 2022, 48(10): 1088-1099. DOI: 10.11936/bjutxb2021040030
    Citation: CUI Zheng, HU Yongli, SUN Yanfeng, YIN Baocai. Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey[J]. Journal of Beijing University of Technology, 2022, 48(10): 1088-1099. DOI: 10.11936/bjutxb2021040030

    Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey

    • Collaborative analysis and processing of cross-modal data are always difficult and hot topics in the field of modern artificial intelligence. The main challenge is the semantic and heterogeneous gap of cross-modal data. Recently, with the rapid development of deep learning theory and technology, algorithms based on deep learning have made great progress in the field of image and text processing, and then the research topic of visual question answering (VQA) has emerged. VQA system uses visual information and text questions as input to get corresponding answers. The core of the system is to understand and process visual and text information cooperatively. Therefore, VQA methods were reviewed in detail. According to the principle of methods, the existing VQA methods were divided into three categories including data fusion, cross-modal attention and knowledge reasoning. The latest development of VQA methods was comprehensively summarized and analyzed, the commonly used VQA data sets were introduced and prospects for future research direction were suggested.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return