Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
A key challenge across many disciplines is to extract meaningful information from data which is often obscured by noise. These datasets are typically represented as large matrices. Given the current trend of ever-increasing data volumes, with datasets growing larger and more complex, it is necessary to develop matrix inference methodologies which provide us with the tools to deal with high-dimensional matrices.This thesis presents a theoretical exploration of high-dimensional matrix inference problems. The high-dimensional nature of the matrices makes them amenable to the application of statistical methods in the high-dimensional limit. We primarily investigate spectral estimators, which are based on the spectral properties of matrices and constructed using their singular vectors or eigenvectors. The methodologies employed are rooted in random matrix theory and statistical physics, alongside results from the high-dimensional limits of spherical integrals. This approach provides a comprehensive theoretical framework for understanding matrix inference in the context of large-scale data.We begin by studying low-rank estimation problems in the mismatched setting, where perfect knowledge of the priors for both signal and noise is not available. In this scenario, we derive the exact analytic expression for the asymptotic mean squared error (MSE) in the large system size limit for the particular case of Gaussian priors and additive noise for both symmetric and non-symmetric signals. Our formulas demonstrate that in the mismatched case, effective estimation is achievable, and the minimum MSE (MMSE) can be attained by selecting a non-trivial set of parameters beyond the matched parameters. Furthermore, we compare the performance of the spectral algorithms and Approximate Message Passing (AMP) in the mismatched setting. In the latter part of the thesis, we explore the extensive-rank matrix inference problems using the framework of rotationally invariant estimators (RIEs). In the symmetric case, we study the asymptotic mutual information and MMSE of denoising problem under Gaussian noise. Moreover, we extend RIEs to accommodate rectangular matrices for general rotational invariant noise matrices. Consequently, we derive the asymptotic MMSE in this setting. Finally, we investigate a statistical model for matrix factorization, and derive analytical formulas for the optimal RIE to reconstruct the two matrix factors, given the noisy observation of their product.
Jean-Paul Richard Kneib, Huanyuan Shan