Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The R package bclust is useful for clustering high-dimensional continuous data. The package uses a parametric spike-and-slab Bayesian model to downweight the effect of noise variables and to quantify the importance of each variable in agglomerative clustering. We take advantage of the existence of closed-form marginal distributions to estimate the model hyper-parameters using empirical Bayes, thereby yielding a fully automatic method. We discuss computational problems arising in implementation of the procedure and illustrate the usefulness of the package through examples.
Vinitra Swamy, Paola Mejia Domenzain, Julian Thomas Blackwell, Isadora Alves de Salles