Skip to main content
Video s3
    Details
    Author(s)
    Display Name
    Rujun Song
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Jiaqi Liu
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Kaisheng Liao
    Affiliation
    Affiliation
    Science and Technology on Electronic Information Control Laboratory
    Display Name
    Zhuoling Xiao
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Bo Yan
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Abstract

    Learning-based monocular visual odometry (VO) has lately drawn significant attention for its robustness to camera parameters and environmental variations. Unlike most self-supervised learning-based methods, our approach simultaneously focuses on the adjacent and interval co-visibility correspondence to improve the pose estimation. To handle different pixel displacements, we apply the Multi-scale Feature Fusion component for the full exploration of latent motion features. Besides, the Interval Feature Guided Refinement component is incorporated to adaptively exploit the continuity of camera motions and steer the network for retaining pose consistency in the time domain. Extensive experiments on the KITTI and Malaga datasets have demonstrated the promising performance of our approaches. The proposed method produces competitive results against classic algorithms and outperform state-of-the-art methods by up to 23.9% and 15.4% on average translational and rotational evaluation.