Adaptive Local Spatiotemporal Feature Extraction Based on RGB-D Data
-
Graphical Abstract
-
Abstract
Noise and global empirical motion constraints seriously affect extracting accurate and sufficient spatiotemporal features for one-shot learning gesture recognition. To tackle the problem, an adaptive local spatiotemporal feature extraction approach with both color and depth (RGB-D) information fused was proposed. Firstly, pyramids and optical flow pyramids of successive two gray frames and two depth frames were built as scale space. Then, motion regions of interest (MRoIs) were adaptively extracted according to horizontal and vertical variances of the gray and depth optical flow. Subsequently, corners were just detected as interest points in the MRoIs. These interest points were selected as keypoints only if their optical flow meet adaptive local gray and depth motion constraints. The local motion constraints were adaptively determined in each MRoI. Finally, SIFT-like descriptors were calculated in improved gradient and motion spaces. Experimental results of ChaLearn dataset demonstrate that the proposed approach has higher recognition accuracy that is comparable to the published approaches.
-
-