Abstract:
To explore the intrinsic spatio-temporal representation of dynamic hand gesture in the video-based hand gesture recognition, this paper proposed a 3D-2D restricted Boltzmann machine (RBM) model, which is able to model the spatio-temporal correlation of hand gesture video data. Especially, a method combining traditional hand-defined feature with 3D-2D RBM was proposed to describe hand gesture better. The proposed hybrid 3D-2D RBM model consists of three phases. First, Canny-2D HOG and optical flow 2D HOG were used to describe the spatial and temporal feature, respectively. A 3D-2D RBM was then adopted to learn the latent high-level semantics. Finally, the two-channel discrimination results were fused together for recognition. The experimental results on the public Cambridge Hand Gesture Data set show that the proposed hybrid 3D-2D RBM outperforms the state-of-the-art.