Sparse and Parametric Modeling with Applications to Acoustics and Audio

Helena Peic Tukuljac
2020
Thèse EPFL

Résumé

Recent advances in signal processing, machine learning and deep learning with sparse intrinsic structure of data have paved the path for solving inverse problems in acoustics and audio. The main task of this thesis was to bridge the gap between the powerful mathematical tools and challenging problems in acoustics and audio. This thesis consists out of two main parts.

The first part of the thesis focuses on the questions related to acoustic simulations that comply with the "real world" constraints and the acoustic data acquisition inside of closed spaces. The simulated and measured data is used to solve various types of inverse problems with underlying sparsity. By using the technique of compressed sensing, we estimate the room modes, localize sound sources in a room and also estimate room's geometry. The Finite Rate of Innovation technique is coupled with non-convex optimization for the task of blind deconvolution in the context of echo retrieval. We also invent a new statistical measure for the echo density for the purpose of detecting the type of acoustic environment from its acoustic impulse response, even beyond fully closed spaces. These types solutions can have an application in the blooming domain of virtual, augmented and mixed reality for sound compression and rendering.

The second part of the thesis focuses on the recent trends in machine learning that are centered around deep learning. Large scale data acquisition of acoustic impulse responses is still a challenging and very expensive task. Also, the existing databases tend to be too heterogeneous to be merged, due to the lack of the standardization of the acquisition procedure, and also the available metadata tends to be incomplete. In order to keep up with the recent trends and avoid the difficulties that come from the lack of large scale acoustical data, the last part of research in this thesis has diverged from the rest and is devoted to deep learning applied to classification problems in audio with the focus on speech and environmental sounds. The learning procedure is parametrized, which results in an off-grid learning procedure for audio classification. Learned trends align with perceptual trends, which helps the interpretation of the achieved results.

Source officielle

https://infoscience.epfl.ch/record/273930?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Sparse and Parametric Modeling with Applications to Acoustics and Audio

Graph Chatbot

Chattez avec Graph Search

Robust machine learning for neuroscientific inference

Generalization and Personalization of Machine Learning for Multimodal Mobile Sensing in Everyday Life

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Robust machine learning for neuroscientific inference

Generalization and Personalization of Machine Learning for Multimodal Mobile Sensing in Everyday Life

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain