Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
One of the shortcomings of the existing clustering methods is their problems dealing with different shape and size clusters. On the other hand, most of these methods are designed for especial cluster types or have good performance dealing with particular size and shape of clusters. The main problem in this connection is how to define a dissimilarity criterion to make this algorithm capable of clustering general data, which include clusters of different shape and size. Another important objective that must be considered is the computational complexity of any new algorithms. In this paper a new approach to fuzzy clustering is proposed in which a model for each cluster is estimated during learning. Gradually besides, dissimilarity metric for each cluster is defined, updated and used for the next step. In our approach, instead of associating a single cluster type to each cluster, we assume a set of possible cluster types for each cluster with different grades of possibility. Then, a truncation which can be expressed as attention mechanism focuses on the most probable cluster types for each cluster. This selection step subsides the computational load dramatically while speeds up the clustering. The proposed clustering method which has the capability to deal with partial labeled data is implemented on two families of data, first in presence of partially labeled data, then with fully unlabeled data. Comparing the experimental results of this method with several important existing algorithms, demonstrates the superior performance of proposed method. The merit of this method is its ability to deal with clusters of different shape and size while it computes a fuzzy membership value to different shapes for each cluster.
Vinitra Swamy, Paola Mejia Domenzain, Julian Thomas Blackwell, Isadora Alves de Salles