Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Detection and localization of multiple speakers in a noisy and reverberant environment is a fundamental and difficult task. In the literature, steered response power (SRP) based techniques are typically used to accomplish this task which can be computationally intensive. Nonetheless, the localization of multiple speakers remains a challenging in practice. In this paper, we present a novel approach based on a probabilistic interpretation of the SRP. The proposed method replaces the discrete search techniques by proposing an approximate analytical form of the SRP, which can adequately detect and localize multiple speakers. In addition to reliable detection and localization, the potential advantage of this approach is that it provides a probability density function (pdf) of the individual speaker positions rather than point estimates. Experiments on the AV16.3 corpus show the efficacy of the proposed approach.
Fabio Nobile, Yoshihito Kazashi