Abstract:
In recent years, the use of visual information to estimate the pose of the camera to realize the positioning of unmanned vehicles has become a research hotspot. Visual odometry is an important part of it. Traditional visual odometry requires complex processes such as feature extraction, feature matching, and post-processing. It is difficult to solve the optimal situation. Therefore, a visual odometer that combines attention and long short-term memory (LSTM) was proposed in this paper. The convolutional network was enhanced by the attention mechanism, which extracted motion features from the changes between frames. Then, the long and short-term memory network was used for timing modeling. The input was a sequence of RGB pictures, and a pose of end-to-end was output by the model. The experiment was completed on the public unmanned driving KITTI data set and compared with other algorithms. Results show that the error of the method in pose estimation is lower than that of other monocular algorithms, and through qualitative analysis, it has good generalization ability.