Abstract:
To solve the problem that sparse two-stage object pose estimation methods have poor occlusion robustness and real-time performance, as well as poor noise robustness and non differentiable perspective-
n-point (P
nP) calculation process, an end-to-end 6D object pose regression method based on dense vector fields was proposed. The method regards the dense tensor information predicted in the first stage of the voting-based method as input and replaces the pixel-wise voting and P
nP calculation processes of the voting-based method to end-to-end regress 6D object pose information. First, paired vector features were extracted in the feature extraction module. Then, clustering feature loss based on cosine similarity functions was introduced in the feature aggregation module. Finally, the 6D object pose was estimated through rotation and estimation decoupling. Random noise data was added during the data input stage of network training process to improve the noise robustness of the method. The results on the sphere synthesis dataset and the public dataset Occlusion-LINEMOD prove the validity of this method.