Abstract:
To improve the precision of object tracking in videos, this work presented an RGB-D tracking method using a spatial-temporal context (STC) model. By introducing depth data, STC can clearly distinguish target from background in the context, and perform effective fusion of the depth weights and color weights. At the same time, based on the depth information and the target momentum, the proposed method is capable of adjusting scale and handling occlusions. As a result, the proposed tracker is able to produce precise prediction of target locations even when the target object is under severe occlusion. Comprehensive evaluations on challenging datasets demonstrate that the proposed tracker gives favorable performance over several state-of-the-art counterparts. Consequently, the proposed method in this work is capable of achieving more precise and reliable object tracking in videos.