Video Not Available
Details
Cross-modal retrieval methods are developed to retrieve relevant data across different modalities. Usually, supervised cross-modal retrieval methods can achieve higher accuracy than unsupervised methods because they can utilize the semantic information provided by clean labels. However, training data with noisy labels will lead to the performance degradation of supervised cross-modal retrieval methods. In this work, we present a novel framework called Neighborhood Learning for Cross-Modal Retrieval (NLCMR) that is robust against noisy labels by exploiting the information contained in the neighborhood. Our NLCMR contains two main components: Clustering with Neighborhood Alignment and Neighborhood Contrastive Learning. The first component focuses on reducing the impact of noisy labels and improving clustering robustness, and the second component learns from noisy data by exploring pairwise and neighborhood information. Extensive experiments are conducted on three multi-modal datasets to demonstrate the effectiveness of NLCMR.