Skip to main content
Video s3
    Details
    Author(s)
    Display Name
    Huimin Yu
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Ruoqi Li
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Zhuoling Xiao
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Bo Yan
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Abstract

    Monocular depth estimation is a significant task in computer vision, which can be widely used in Simultaneous Localization and Mapping (SLAM) and navigation. However, the current unsupervised approaches have limitations in global information perception, especially at distant objects and the boundaries of the objects. To overcome this weakness, we propose a global-aware attention model called GlobalDepth for depth estimation, which includes two essential modules: Global Feature Extraction (GFE) and Selective Feature Fusion (SFF). GFE considers the correlation among multiple channels and refines the encoder feature by extending the receptive field of the network. Furthermore, we restructure the skip connection by employing SFF between the low-level and the high-level features in element wise, rather than simply concatenation or addition at the feature level. Our model excavates the key information and enhances the ability of global perception to predict details of the scene. Extensive experimental results demonstrate that our method reduces the absolute relative error by 10.32% compared with other state-of-the-art models on KITTI datasets.