Citation: | LI Jinghua, HUAI Huarui, KONG Dehui, WANG Lichun, SUN Yanfeng. Dynamic Hand Gesture Recognition Based on Two-channel Hybrid 3D-2D RBM[J]. Journal of Beijing University of Technology, 2019, 45(5): 428-435. DOI: 10.11936/bjutxb2017090018 |
To explore the intrinsic spatio-temporal representation of dynamic hand gesture in the video-based hand gesture recognition, this paper proposed a 3D-2D restricted Boltzmann machine (RBM) model, which is able to model the spatio-temporal correlation of hand gesture video data. Especially, a method combining traditional hand-defined feature with 3D-2D RBM was proposed to describe hand gesture better. The proposed hybrid 3D-2D RBM model consists of three phases. First, Canny-2D HOG and optical flow 2D HOG were used to describe the spatial and temporal feature, respectively. A 3D-2D RBM was then adopted to learn the latent high-level semantics. Finally, the two-channel discrimination results were fused together for recognition. The experimental results on the public Cambridge Hand Gesture Data set show that the proposed hybrid 3D-2D RBM outperforms the state-of-the-art.
[1] |
ESHED O B, MOHAN M T. Hand gesture recognition in real time for automotive interfaces:a multimodal vision-based approach and evaluations[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(6):2368-2377. doi: 10.1109/TITS.2014.2337331
|
[2] |
WU D, SHAO L. Deep dynamic neural networks for gesture segmentation and recognition[C]//Computer Vision-ECCV 2014 Workshops. Berlin: Springer, 2015: 552-571. https://www.researchgate.net/publication/297663784_Deep_Dynamic_Neural_Networks_for_Gesture_Segmentation_and_Recognition?_sg=Im7JIgfbZ9Whyst8Nc2vhm7wyJhMxyK55tosYhLisWNu33hsi9qR3TpWVxRdycjYnMEd3XsKe_tQJSH-lVjTvQ
|
[3] |
AUEPHANWIRIYAKUL S, PHITAKWINAI S. Thai sign language translation using scale invariant feature transform and hidden markov models[J]. Pattern Recognition Letters, 2013, 34(11):1291-1298. doi: 10.1016/j.patrec.2013.04.017
|
[4] |
WANG M, CHEN W Y, LI X D. Hand gesture recognition using valley circle feature and hu's moments technique for robot movement control[J]. Measurement, 2016, 94:734-744. doi: 10.1016/j.measurement.2016.09.018
|
[5] |
PRASUHN L, OYAMADA Y, MOCHIZUKI Y, et al. A hog-based hand gesture recognition system on a mobile device[C]//Proceedings of IEEE International Conference on Image Processing. Piscataway: IEEE, 2014: 3973-3977. https://www.researchgate.net/publication/282375817_A_HOG-based_hand_gesture_recognition_system_on_a_mobile_device
|
[6] |
SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems. New York: ACM, 2014: 568-576. https://www.researchgate.net/publication/262974436_Two-Stream_Convolutional_Networks_for_Action_Recognition_in_Videos?ev=auth_pub
|
[7] |
CHEN F S, FU C M, HUANG C L. Hand gesture recognition using a real-time tracking method and hidden markov models[J]. Image and Vision Computing, 2003, 21(8):745-758. doi: 10.1016/S0262-8856(03)00070-2
|
[8] |
YANG M H, AHUJA N, TABB M. Extraction of 2D motion trajectories and its application to hand gesture recognition[J]. IEEE Transaction Pattern Analysis and Machine Intelligence, 2002, 24(8):1061-1074. doi: 10.1109/TPAMI.2002.1023803
|
[9] |
FISCHER A, IGEL C. An introduction to restricted boltzmann machines[J]. Lecture Notes in Computer Science, 2012, 7441:14-36. doi: 10.1007/978-3-642-33275-3
|
[10] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. New York: ACM, 2012: 1097-1105. https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks
|
[11] |
MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3d convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2015: 1-7. https://www.researchgate.net/publication/308658798_Hand_gesture_recognition_with_3D_convolutional_neural_networks
|
[12] |
HUANG J, ZHOU W, LI H, et al. Sign language recognition using 3d convolutional neural networks[C]//Proceedings of IEEE International Conference on Multimedia & Expo. Piscataway: IEEE, 2015: 1-6. https://www.researchgate.net/publication/313650445_Sign_Language_Recognition_using_3D_convolutional_neural_networks
|
[13] |
QI G L, SUN Y F, GAO J B, et al. Matrix Variate RBM and Its Applications[C]//Proceedings of IEEE International Joint Conference on Neural Networks. Piscataway: IEEE, 2016: 389-395. https://www.researchgate.net/publication/289587986_Matrix_Variate_RBM_and_Its_Applications
|
[14] |
HINTON G E, SALKHUTDINOW R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507. doi: 10.1126/science.1127647
|
[15] |
CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1986, 8(6):679-98. http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_3d39d0b11988c5f90bf44b10d764f020
|
[16] |
FARNEBACK G. Two-frame motion estimation based on polynomial expansion[J]. Lecture Notes in Computer Science, 2003, 2749:363-370. doi: 10.1007/3-540-45103-X
|
[17] |
NAVNEET D, BILL T. Histograms of oriented gradients for human detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005: 886-893. https://www.researchgate.net/publication/281327886_Histograms_of_Oriented_Gradients_for_Human_Detection
|
[18] |
KIM T K, CIPOLLA R. Canonical correlation analysis of video volume tensors for action categorization and Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(8):1415-1428. doi: 10.1109/TPAMI.2008.167
|
[19] |
LUI Y M. Human gesture recognition on product manifolds[J]. Journal of Machine Learning Research, 2012, 13(1):3297-3321. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=CC0210589779
|