面向跨模态数据协同分析的视觉问答方法综述

    Visual Question Answering Methods of Cross-modal Data Collaborative Analysis: a Survey

    • 摘要: 协同分析和处理跨模态数据一直是现代人工智能领域的难点和热点,其主要挑战是跨模态数据具有语义和异构鸿沟. 近年来,随着深度学习理论和技术的快速发展,基于深度学习的算法在图像和文本处理领域取得了极大的进步,进而产生了视觉问答(visual question answering, VQA)这一课题. VQA系统利用视觉信息和文本形式的问题作为输入,得出对应的答案,核心在于协同理解和处理视觉、文本信息. 因此,对VQA方法进行了详细综述,按照方法原理将现有的VQA方法分为数据融合、跨模态注意力和知识推理3类方法,全面总结分析了VQA方法的最新进展,介绍了常用的VQA数据集,并对未来的研究方向进行了展望.

       

      Abstract: Collaborative analysis and processing of cross-modal data are always difficult and hot topics in the field of modern artificial intelligence. The main challenge is the semantic and heterogeneous gap of cross-modal data. Recently, with the rapid development of deep learning theory and technology, algorithms based on deep learning have made great progress in the field of image and text processing, and then the research topic of visual question answering (VQA) has emerged. VQA system uses visual information and text questions as input to get corresponding answers. The core of the system is to understand and process visual and text information cooperatively. Therefore, VQA methods were reviewed in detail. According to the principle of methods, the existing VQA methods were divided into three categories including data fusion, cross-modal attention and knowledge reasoning. The latest development of VQA methods was comprehensively summarized and analyzed, the commonly used VQA data sets were introduced and prospects for future research direction were suggested.

       

    /

    返回文章
    返回