**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Person# Félix Benedito Clément Musil

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related units

Loading

Courses taught by this person

Loading

Related research domains

Loading

Related publications

Loading

People doing similar research

Loading

Related research domains (10)

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machin

Simulation

A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the se

Structure

A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buil

Related publications (19)

Courses taught by this person

People doing similar research (117)

No results

Loading

Loading

Loading

Related units (2)

Over the last two decades, many technological and scientific discoveries, ranging from the development of materials for energy conversion and storage through the design of new drugs, have been accelerated by the use of preliminary in silico experiments, to steer and inform synthesis and characterization. This new computational paradigm has been particularly significant for simulations taking place at the atomic scale, which provide a predictive framework to determine the properties of condensed phases and molecular systems from first principles. Thanks to the steady improvement in accuracy and efficiency of ab initio methods, as well as to the increase in the performance (and reduction in the cost) of computational resources, once-prohibitive quantum mechanical calculations of atomic-scale properties have become affordable and ubiquitous. The rise of ab initio and high-throughput materials design and discovery, however, brings both challenges and opportunities.
Large repositories of atomistic data require complicated, time-consuming analyses to rationalize the relationship between the structure and the properties, and to determine the most promising candidates for a given application. Oftentimes - for instance when considering molecular dynamics simulations that sample the finite-temperature fluctuations of materials in realistic thermodynamic conditions - first-principle calculations contain large amounts of redundant data, for which a direct ab initio treatment is still prohibitively expensive. The availability of large amounts of data, and the fact that many applications require to sample repeatedly configurations that share considerable similarities, provide the ideal scenario to leverage statistical learning techniques.
This thesis presents several methodological advances to the representation of condensed phase matter at the atomic scale to develop data-driven atomistic models. We present an atom density framework to build n-body representations encoding the chemical structure along with the fundamental symmetries of such systems and draw links between several popular representations. Building on this framework, we explore large databases of small peptides and molecular crystals using clustering and dimensionality reduction, unsupervised learning techniques, through maps of their structural correlations. These simple overviews of entire datasets allowed us to highlight structure-property relations and to check for their consistency and reliability Thanks to the generality of this representation we also applied supervised learning to construct surrogate models of several quantum properties such as the chemical shifts in molecular materials and the stability of molecular materials, small molecules, and perovskites. We further improve the quality of these models by introducing property and system-specific knowledge into the representation to increase its correlation with the target properties. Such optimization of the representation helps reducing the error of model predictions, but being able to estimate the accuracy of these predictions is just as useful. To simplify computing uncertainty estimates for the predicted properties, we provided simple schemes to calibrate them and assess their accuracy thus increasing the reliability of data-driven models of materials.

Michele Ceriotti, Guillaume André Jean Fraux, Alexander Jan Goscinski, Till Junge, Félix Benedito Clément Musil, Max David Veit, Michael John Willatt

Physically motivated and mathematically robust atom-centered representations of molecular structures are key to the success of modern atomistic machine learning. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules and to explore and visualize their chemical structures and compositions. Recently, it has become clear that many of the most effective representations share a fundamental formal connection. They can all be expressed as a discretization of n-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing their evaluation. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss smooth overlap of atomic position (SOAP) features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis sets. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to reduce the total computational cost by a factor of up to 4 without affecting the model’s symmetry properties and without significantly impacting its accuracy.

2021Michele Ceriotti, Alexander Jan Goscinski, Félix Benedito Clément Musil, Jigyasa Nigam, Sergey Pozdnyakov

The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here, we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can build a unique basis that is optimal in this sense and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when working with representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models.