**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Unlabeled Principal Component Analysis and Matrix Completion

Abstract

We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that UPCA is a well-defined algebraic problem since we prove that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix, arising as the unique solutions of a polynomial system of equations. Further, we propose an efficient two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data have been permuted. Stage-I employs outlier-robust PCA methods to estimate the ground-truth column-space. Equipped with the column-space, Stage-II applies recent methods for unlabeled sensing to restore the permuted data. Allowing for missing entries on top of permutations in UPCA leads to the problem of unlabeled matrix completion, for which we derive theory and algorithms of similar flavor. Experiments on synthetic data, face images, educational and medical records reveal the potential of our algorithms for applications such as data privatization and record linkage.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related MOOCs (18)

Related publications (33)

Related concepts (39)

Algebra (part 1)

Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.

Algebra (part 1)

Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.

Algebra (part 2)

Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.

Ontological neighbourhood

Algebraic geometry

Algebraic geometry is a branch of mathematics which classically studies zeros of multivariate polynomials. Modern algebraic geometry is based on the use of abstract algebraic techniques, mainly from commutative algebra, for solving geometrical problems about these sets of zeros. The fundamental objects of study in algebraic geometry are algebraic varieties, which are geometric manifestations of solutions of systems of polynomial equations.

Pattern recognition

Pattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

K-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership.

Modern neuroscience research is generating increasingly large datasets, from recording thousands of neurons over long timescales to behavioral recordings of animals spanning weeks, months, or even years. Despite a great variety in recording setups and expe ...

There is a growing recognition that electronic band structure is a local property of materials and devices, and there is steep growth in capabilities to collect the relevant data. New photon sources, from small-laboratory-based lasers to free electron lase ...

Machine learning (ML) enables artificial intelligent (AI) agents to learn autonomously from data obtained from their environment to perform tasks. Modern ML systems have proven to be extremely effective, reaching or even exceeding human intelligence.Althou ...