Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The number of materials or molecules that can be created by combining different chemical elements in various proportions and spatial arrangements is enormous. Computational chemistry can be used to generate databases containing billions of potential structures (Ruddigkeit, Deursen, Blum, & Reymond, 2012), and predict some of the associated properties (Montavon et al., 2013; Ramakrishnan, Dral, Rupp, & Lilienfeld, 2014). Unfortunately, the very large number of structures makes exploring such database — to understand structure-property relations or find the best structure for a given application — a daunting task. In recent years, multiple molecular representations(Bartók, Kondor, & Csányi, 2013; Behler &Parrinello, 2007; Willatt, Musil, & Ceriotti, 2019) have been developed to compute structural similarities between materials or molecules, incorporating physically-relevant information and symmetries. The features associated with these representations can be used for unsupervised machine learning applications, such as clustering or classification of the different structures,and high-through put screening of database for specific properties (De, Musil, Ingram, Baldauf,& Ceriotti, 2017; Hautier, 2019; Maier, Stöwe, & Sieg, 2007). Unfortunately, the dimensionality of these features (as well as most of other descriptors used in chemical and materials informatics) is very high, which makes the resulting classifications, clustering or mapping very hard to visualize. Dimensionality reduction algorithms (Ceriotti, Tribello, & Parrinello, 2011;McInnes, Healy, & Melville, 2018; Schölkopf, Smola, & Müller, 1998) can reduce the number of relevant dimensions to a handful, creating 2D or 3D maps of the full database.