Triplet Confidence for Robust Out-of-Vocabulary Keyword Spotting

Keyword Spotting (KWS) is a task that detects pre-defined keywords from audio stream. There are some problems, including over-reliance on labeled data, the great imbalance datasets, and short speech keywords, which will cause the low robustness of KWS on out-of-vocabulary samples. Therefore, we propose a model that maintains robustness on OOV samples by learning confidence estimates of the model. Confidence estimation is output by self-attentional confidence branch, which can focus on single keywords in context. And we propose a loss function that learning confidence estimation to improve the reliability of the model without relying on manually labeled data.

Slides

Triplet Confidence for Robust Out-of-Vocabulary Keyword Spotting (application/pdf)

Download