Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
Mining useful clusters from high dimensional data has received sig- nificant attention of the signal processing and machine learning com- munity in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of di- mensionality. However, often such methods are accompanied with problems such as high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix fac- torization methods) or susceptibility to gross corruptions in the data. In this paper we propose a convex, robust, scalable and efficient Prin- cipal Component Analysis (PCA) based method to approximate the low-rank representation of high dimensional datasets via a two-way graph regularization scheme. Compared to the exact recovery meth- ods, our method is approximate, in that it enforces a piecewise con- stant assumption on the samples using a graph total variation and a piecewise smoothness assumption on the features using a graph Tikhonov regularization. Futhermore, it retrieves the low-rank rep- resentation in a time that is linear in the number of data samples. Clustering experiments on 3 benchmark datasets with different types of corruptions show that our proposed model outperforms 7 state-of- the-art dimensionality reduction models.