Abstract:
To solve the problem that the visual odometry based on supervised learning requires the real pose data of dataset and in fact the number of qualified samples is small, a pose estimation method was proposed based on self-supervised convolutional neural network and convolutional long short-term memory. First, image sequences were taken as input, and the features related to motion were extracted through convolutional neural network. Then, convolutional long short term memory network was used for sequential modeling. Finally, the pose with 6 degrees of freedom was output. The model used a loss function based on epipolar geometry to optimize network parameters by self-supervised learning. The model was tested on KITTI dataset and compared with other four algorithms. Results show that the proposed method is superior to other monocular algorithms in accuracy of the pose estimation, and it also has good generalization ability.