Details
Learning-based monocular visual odometry (VO) has lately drawn significant attention for its robustness to camera parameters and environmental variations. Unlike most self-supervised learning-based methods, our approach simultaneously focuses on the adjacent and interval co-visibility correspondence to improve the pose estimation. To handle different pixel displacements, we apply the Multi-scale Feature Fusion component for the full exploration of latent motion features. Besides, the Interval Feature Guided Refinement component is incorporated to adaptively exploit the continuity of camera motions and steer the network for retaining pose consistency in the time domain. Extensive experiments on the KITTI and Malaga datasets have demonstrated the promising performance of our approaches. The proposed method produces competitive results against classic algorithms and outperform state-of-the-art methods by up to 23.9% and 15.4% on average translational and rotational evaluation.