Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
This paper presents a novel approach to predicting the intrusiveness of background noises in speech signals as it is perceived by human listeners. This problem is of particular interest in telephony, where the recently widened range of transmitted audio frequencies has increased the importance of appropriate background noise reduction strategies. Current approaches predict the average noise intrusiveness score that would be obtained in a subjective listening test by combining different signal features related to physical properties (e.g., signal energy, spectral distribution) or psychoacoustic estimations (e.g., loudness) of noise. The combination and/or implementation of such features requires expert knowledge or the availability of training data. We present a novel approach that is based on a model of efficient sound coding, using a sparse spike coding representation of noise. We show that the sparsity of these representations implicitly models several factors in the perception of noise, and yields predictions of noise intrusiveness scores that compare to or outperform traditional features, without the use of training data. Our evaluation datasets and used performance metrics are based on standardized methods for the evaluation of quality prediction models.
Luc Thévenaz, Malak Mohamed Hossameldeen Omar Mohamed Galal, Yuting Yang, Li Zhang, Suneetha Sebastian