Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Theoretical and computational approaches to the study of materials and molecules have, over the last few decades, progressed at an exponential rate. Yet, the possibility of producing numerical predictions that are on par with experimental measurements is to date still hindered by a major computational barrier. In this context, machine-learning methods have emerged as an effective strategy to overcome this barrier by means of statistical approximations that rely only on the knowledge of the atomic coordinates of the system. The quality of these approximations strongly depends on the adoption of mathematical representations of the atomic structure that mirror the physical behaviour of the learning target. In this thesis, we make use of this general principle to tackle some particularly tricky aspects in the data-driven prediction of materials properties. The first part addresses the problem of interpolating physical tensors, such as any quantity that follows a set of prescribed transformation rules under a three-dimensional rotation of the system. We derive mathematical representations of the atomic structures that satisfy the symmetry of spherical harmonics. This family of atomistic features can be used to efficiently regress the irreducible spherical decomposition of any Cartesian tensor. We benchmark the method on the optical series of water oligomers, the dielectric response of liquid water, as well as high-end polarizabilities of heterogeneous molecular datasets. Taking the crystal polymorphs of paracetamol as an example, we finally discuss the possibility of computing the Raman spectrum on top of predicted values of polarizabilities. The second part of the thesis makes use of the symmetry-adapted representations previously introduced to address the challenging problem of learning and predicting scalar fields, such as the electronic charge density of a system. The main difficulty is associated with the decomposition of the field on a multi-centered non-orthogonal basis, which comes along with the derivation of a specifically designed regression algorithm. Making the electron density decomposition compatible with auxiliary basis sets commonly used in quantum-chemistry codes, we show the capability of the method to perform highly transferable predictions for arbitrarily complex molecules, that scale linearly with the system size. The last part of the thesis addresses the problem of incorporating a long-range description within state-of-the-art local machine-learning schemes. This is done by deriving a family of representations where a smooth Coulomb-like potential associated with the distribution of atoms is evaluated at the local scale. In particular, a suitable combination of long-range and local features makes it possible to design a learning framework that shows an asymptotic behaviour that allows us to capture repulsion, electrostatic, polarization and dispersion phenomena, on an equal footing. The method performance is tested on the binding energy of organic dimers, the mutual polarization between a water molecule and a metallic surface of lithium, and the dielectric response of peptidic chains. By and large, this research study shows how a wise interplay between a totally agnostic learning method and a physically grounded approximation allows us to predict arbitrarily complex atomistic properties, paving the way to the accurate simulation of materials over time and length scales that are not accessible by first-principles
Nikolaos Stergiopulos, Sokratis Anagnostopoulos