Skip to main content

Video Not Available

    Details
    Author(s)
    Display Name
    Yazid Attabi
    Affiliation
    Affiliation
    McGill University
    Display Name
    Benoit Champagne
    Affiliation
    Affiliation
    McGill University
    Display Name
    Wei-Ping Zhu
    Affiliation
    Affiliation
    Concordia University
    Abstract

    In this work, we propose a new speech enhancement model referred to as auditory scene-attention model (ASAM), that can adapt dynamically to changes in the auditory scene components, such as speaker gender, input SNR levels, and background noise properties. To this end, a representative set of so-called Universal Scene Models (USM), each associated to a different auditory scene component, are first created, where each model attempts to predict a corresponding ideal ratio mask (IRM). The dynamic adaptation to changes in the auditory scene is then carried by computing the outputs of the USMs and forming a weighted combination of the most relevant scene models. This adaptation process is implemented via a frame- based attention mechanism, allowing to realize a soft selection of USM models, and taking advantage from both scene-dependent and scene-independent models. The evaluation of the proposed ASAM model, under different noise conditions and input SNR levels, shows substantial improvements in terms of standard speech quality and intelligibility measures.