时空特征与通道注意力融合的视觉手势识别技术

何坚; 刘炎; 祖天奇

doi:10.11936/bjutxb2020120028

时空特征与通道注意力融合的视觉手势识别技术

Visual Gesture Recognition Based on Spatial-Temporal Features and Channel Attention

摘要

摘要: 为了解决双流融合网络对动态手势关键帧及手部轮廓特征检测不足的问题，提出一种手势时空特征与通道注意力融合的动态手势识别方法.首先，在双流融合网络中引入有效通道注意力（eficient channel attention，ECA）增强双流识别算法对手势关键帧的关注度，并利用双流中的空间卷积网络和时间卷积网络分别提取动态手势中的空间和时序特征；其次，通过ECA在空间流中选取最高关注度的手势帧，利用单发多框检测器技术（single shot multibox detector，SSD）提取相应手部轮廓特征；最后，将手部轮廓特征与双流中提取的人体姿态特征、时序特征融合后分类识别手势.该方法在Chalearn 2013多模态手语识别数据集上进行了验证，准确率为66.23%，相比之前在该数据集上仅使用RGB信息进行双流识别的方法获得了更好的手势识别效果.

Abstract: To solve problems of insufficient detection of dynamic gesture key frames and hand contour features in two-stream fusion network, a dynamic gesture recognition method was proposed in this paper based on the fusion of spatial-temporal features and channel attention. First, the efficient channel attention (ECA) was introduced into the two-stream fusion network to enhance the attention of key frames of gestures, the spatial convolutional network and the temporal convolutional network of two-stream were used to extract spatial and temporal features of dynamic gestures. Second, the gesture frame with the highest attention in the spatial network was selected by ECA, and single shot multibox detector (SSD) was used to extract the hand contour features. Finally, hand contour features were integrated with body posture features and temporal features were extracted from two-stream to recognize gestures. The method proposed in this paper was verified on Chalearn 2013 multi-modal sign language recognition dataset, with an accuracy rate of 66.23%. Compared with the previous two-stream methods which only RGB information from this dataset was adopted, it achieves a better gesture recognition effect.

HTML全文

参考文献(28)

施引文献

资源附件(0)