Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
Inspired by the human ability to localize sounds, even with only one ear, as well as to recognize objects using active echolocation, we investigate the role of sound scattering and prior knowledge in regularizing ill-posed inverse problems in acoustics. In particular, we study direction of arrival estimation with one microphone, acoustic imaging with a small number of microphones, and microphone array localization. Not only are these problems ill-posed but also non-convex in the variables of interest when formulated as optimization problems. To restore well-posedness, we thus use sound scattering which we construe as a physical form of regularization. We additionally use standard regularization in the form of appropriate priors on the variables. The non-convexity is then handled with tools such as linearization or semidefinite relaxation.
We begin with direction of arrival estimation. While conventional approaches require at least two microphones, we show how to estimate the direction of one or more sound sources using only one. This is made possible thanks to regularization by sound scattering which we achieve by compact structures made from LEGO that scatter the sound in a direction-dependent manner. We also impose a prior on the source spectra where we assume they can be sparsely represented in a learned dictionary. Using algorithms based on non-negative matrix factorization, we show how to use the LEGO devices and a speaker-independent dictionary to successfully localize one or two simultaneous speakers.
Next, we study acoustic imaging of 2D shapes using a small number of microphones. Unlike in echolocation where the source is known, we show how to image an unknown object using an unknown source. In this case, we enforce a prior on the object using a total variation norm penalty but no priors on the source. We also show how to use microphones embedded in the ears of a dummy head to benefit from the diversity encoded in the head-related transfer function. We then propose an algorithm to jointly reconstruct the shape of the object and the sound source spectrum. We demonstrate the effectiveness of our approach using numerical and real experiments with speech and noise sources.
Finally, the need to know the microphone positions in acoustic imaging and a number of other applications led us to study microphone localization. We assume the positions of the loudspeakers are also unknown and that all devices are not synchronized. In this case, the times of arrival from the loudspeakers to the microphones are shifted by unknown source emission times and unknown sensor capture times. We thus propose an objective that is timing-invariant allowing us to localize the setup without first having to estimate the unknown timing information. We also propose an approach to handle missing data as well as show how to include side information such as knowledge of some of the distances between the devices. We derive a semidefinite relaxation of the objective which provides a good initialization to a subsequent refinement using the Levenberg-Marquardt algorithm. Using numerical and real experiments, we show we can localize unsynchronized devices even in near-minimal configurations.