Computational Methods for Underdetermined Convolutive Speech Localization and Separation via Model-based Sparse Component Analysis

Hervé Bourlard, Volkan Cevher, Afsaneh Asaei, Mohammadjavad Taghizadeh
2016
Article

Résumé

In this paper, the problem of speech source localization and separation from recordings of convolutive underdetermined mixtures is studied. The problem is cast as recovering the spatio-spectral speech information embedded in a microphone array compressed measurements of the acoustic field. A model-based sparse component analysis framework is formulated for sparse reconstruction of the speech spectra in a reverberant acoustic resulting in joint localization and separation of the individual sources. We compare and contrast the computational approaches to model-based sparse recovery exploiting spatial sparsity as well as spectral structures underlying spectrographic representation of speech signals. In this context, we explore identification of the sparsity structures at the auditory and acoustic representation spaces. The auditory structures are formulated upon the principles of structural grouping based on proximity, autoregressive correlation and harmonicity of the spectral coefficients and they are incorporated for sparse reconstruction. The acoustic structures are formulated upon the image model of multipath propagation and they are exploited to characterize the compressive measurement matrix associated with microphone array recordings. Three approaches to sparse recovery relying on combinatorial optimization, convex relaxation and Bayesian methods are studied and evaluated based on thorough experiments. The sparse Bayesian learning method is shown to yield better perceptual quality while the interference suppression is also achieved using the combinatorial approach with the advantage of offering the most efficient computational cost. Furthermore, it is demonstrated that an average autoregressive model can be learned for speech localization and exploiting the proximity structure in the form of block sparse coefficients enables accurate localization. Throughout the extensive empirical evaluation, we confirm that a large and random placement of the microphones enables significant improvement in source localization and separation performance.

Source officielle

https://infoscience.epfl.ch/record/210625?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Computational Methods for Underdetermined Convolutive Speech Localization and Separation via Model-based Sparse Component Analysis

Graph Chatbot

Chattez avec Graph Search

Live-cell imaging powered by computation

On distributional autoregression and iterated transportation

A Combination Technique for Optimal Control Problems Constrained by Random PDEs

A Combination Technique for Optimal Control Problems Constrained by Random PDEs

Live-cell imaging powered by computation

On distributional autoregression and iterated transportation