Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm, which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional groups related to segments of spiral arms. We use two different supervised ensemble classification algorithms: Random Forest and Gradient Boosting. Using a sample of similar or equal to 24 000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic redshifts, and we test our classification against the Galaxy Zoo 2 catalogue. We find that features extracted from our pipeline give, on average, an accuracy of similar or equal to 93 per cent, when testing on a test set with a size of 20 per cent of our full data set, with features deriving from the angular distribution of density attractor ranking at the top of the discrimination power.