基于残差密集孪生网络的视频目标跟踪
Video Object Tracking Based on Residual Dense Siamese Network
-
摘要: 针对现有基于孪生网络的视频目标跟踪(video object tracking, VOT)方法存在的特征提取能力不足以及对外观变化过大或平面外旋转等目标跟踪效果不佳的问题, 提出一种基于残差密集孪生网络的VOT方法. 首先, 使用嵌入卷积注意力的残差密集网络对模板帧图像和检测帧图像分别提取不同层次的特征; 然后, 将不同层次的特征通过相互独立的区域候选网络进行互相关操作; 最后, 将多个区域候选网络的输出自适应加权求和, 得到最终的跟踪结果. 实验结果表明, 该方法在应对目标表观变化过大、平面外旋转等挑战时, 能够获得较好的跟踪效果.Abstract: To solve the problem that the existing video object tracking (VOT) methods based on siamese networks have poor tracking results due to the lack of feature extraction ability, such as severe object appearance change and out-of-plane rotation, a VOT method based on residual dense siamese networks was proposed. First, the residual dense network embedded convolutional block attention module was designed to extract features at different levels from the template image and the detection image. Then, the features of different levels were interlinked by an independent region proposal network. Finally, the outputs of multiple region proposal networks were summed up adaptively and the final tracking result was obtained. Results show that the proposed method can achieve better tracking effect when dealing with challenges such as severe object appearance change and out-of-plane rotation.