WANG Lichun, FU Fangyu, XU Kai, XU Hongbo, YIN Baocai. Scene Graph Generation Method Based on Dual-stream Multi-head Attention[J]. Journal of Beijing University of Technology, 2024, 50(10): 1198-1205. DOI: 10.11936/bjutxb2023020008
    Citation: WANG Lichun, FU Fangyu, XU Kai, XU Hongbo, YIN Baocai. Scene Graph Generation Method Based on Dual-stream Multi-head Attention[J]. Journal of Beijing University of Technology, 2024, 50(10): 1198-1205. DOI: 10.11936/bjutxb2023020008

    Scene Graph Generation Method Based on Dual-stream Multi-head Attention

    • To address the issue that the contextual information obtained by existing scene graph generation methods is limited, an effective context fusion module was proposed, which is the dual-stream multi-head attention module (DMA). By using DMA for object classification and relationship classification, the dual-stream multi-head attention-based scene graph generation network (DMA-Net) was suggested. The proposed method consists of object detection, object semantic parsing, and relationship semantic parsing. First, the object detection module located the objects in the image and extracted the features of the objects. Second, the object dual-stream multi-head attention (O-DMA) in object semantic parsing module was used to obtain the features fused with node contexts, which were decoded by the object semantic decoder to obtain the object labels. Finally, the features fused with edge contexts were output by the relationship dual-stream multi-head attention (R-DMA) in relationship semantic parsing module and decoded by the relationship semantic decoder to get the relationship labels. Comparisons with the proposed method and mainstream scene graph generation methods were conducted on the publicly available visual genome (VG) dataset, the graph constraint recall and no graph constraint recall of DMA-Net for three subtasks including scene graph detection, scene graph classification, and predicate classification were computed for each method. Results show that the proposed method can fully exploit the contextual information in the scene, which enhances the representation capability of features and improves the accuracy of the scene graph generation task.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return