Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Clustering algorithms have evolved to handle more and more complex structures. However, measures allowing to qualify the quality of such partitions are rare and only specic to certain algorithms. In this work, we propose a new cluster validity measure (CVM) handling solutions with arbitrary shapes and various levels of outlier rejection based on notions of cluster cores and outliers. Moreover, we propose an adequate cluster merging system (CMS) to group cluster cores sharing some of their outliers. These outliers may be a mixture of these nearby cores. The extension of the Support Vector Clustering and Gaussian Process Clustering to obtain true hierarchical solutions are presented and applied using the proposed CVM and CMS in synthetic and real experiments showing the benefit for hyperparameter selection.
Florent Gérard Krzakala, Lenka Zdeborová, Luca Pesce, Bruno Loureiro