区域敏感的场景图生成方法

王立春; 付芳玉; 徐凯; 徐洪波; 尹宝才

doi:10.11936/bjutxb2023020036

区域敏感的场景图生成方法

Region-sensitive Scene Graph Generation Method

摘要

摘要: 针对基于关系边界框提取的谓词特征粒度相对较粗的问题, 提出区域敏感的场景图生成(region-sensitive scene graph generation, RS-SGG)方法。谓词特征提取模块将关系边界框分为4个区域, 基于自注意力机制抑制关系边界框中与关系分类无关的背景区域。关系特征解码器在进行关系预测时不仅考虑了物体对的视觉特征和语义特征, 也考虑了物体对的位置特征。在视觉基因组(visual genome, VG)数据集上分别计算了RS-SGG方法针对场景图生成、场景图分类和谓词分类3个子任务的图约束召回率和无图约束召回率, 并与主流的场景图生成方法进行了比较。实验结果表明, RS-SGG的图约束召回率和无图约束召回率均优于主流方法。此外, 可视化实验结果也进一步证明了所提出方法的有效性。

Abstract: Aiming at that the granularity of the predicate feature extracted based on relation bounding box is relatively coarse, a region-sensitive scene graph generation (RS-SGG) method is proposed. The predicate feature extraction module divided the relationship bounding box into four regions and used the self-attention mechanism to suppress background regions that were irrelevant to relationship classification. The relationship feature decoder comprehensively employed the visual, semantic and the position features of object pairs for predicting the predicate relationships. Based on the publicly available visual genome (VG) dataset, RS-SGG was compared with some mainstream scene graph generation methods. The graph constraint recall and no graph constraint recall for three subtasks including scene graph detection, scene graph classification, and predicate classification were computed to evaluate the performance of the SGG models. Results show that graph constraint recall and no graph constraint of RS-SGG are better than that of the mainstream methods. Additionally, the results of visualization experiments further demonstrate the effectiveness of the proposed method.

HTML全文

参考文献(25)

施引文献

资源附件(0)