Abstract:
To reduce the interference of dynamic environment on the pose estimation of vision simultaneous localization and mapping(SLAM), a method to combine object detection network with ORB-SLAM2 system was proposed. In the inter frame motion estimation stage, the bounding box of potential movable objects was obtained by using object detection network to acquire the semantic information of the current frame. Combined with the depth image and according to the maximum between-class variance algorithm, the foreground in the bounding box was segmented, the dynamic feature points in the foreground were deleted, and the remaining feature points were used to estimate the pose. In the loop closure detection stage, the bounding box was used to construct image semantic features, and query similar key frames compared with historical frames. Compared with Bag of Visual Word, the method has faster query speed and less memory consumption. The method on TUM dataset was evaluated, and the results show that the proposed method can effectively improve the performance of ORB-SLAM2 in high dynamic scene.