Encoding quantum-chemical knowledge into machine-learning models of complex molecular properties

Ksenia Briling
2024
EPFL thesis

Abstract

Statistical (machine-learning, ML) models are more and more often used in computational chemistry as a substitute to more expensive ab initio and parametrizable methods. While the ML algorithms are capable of learning physical laws implicitly from data, addition of someprior physical knowledge improves the results and accelerates the training. This thesis covers several aspects of enhancing ML models with quantum-chemical information: representation design, preprocessing of the input data, and loss function choice.The first part focuses on extension of the symmetry-adapted Gaussian process regression model of the electron density. First, we study how the choice of density-fitting and training-loss-function metrics impacts the quality of the predictions. Withal, we show that densitiespredicted by the original model do not integrate to the exact number of electrons which compromises the extrapolative capabilities, and propose a modified, constrained model along with an a posteriori correction. Then, the framework is applied to the on-top pair density.Using a specialized fitting basis set, we train a model to predict CASSCF-quality on-top pair density and compute the on-top pair ratio to visualize static electron correlation effects.The second part introduces the spectrum of approximated Hamiltonian matrices (SPAHM), a family of physics-based molecular representations. Eigenvalue SPAHM is a global representation built from occupied-orbital eigenvalues of an initial-guess Hamiltonian. SPAHM(a,b) are local representations based on initial-guess-level electron densities attributed to atoms and bonds. These representations not only distinguish different molecules and conformations, but also different spin, charge, and potentially electronic states. The advantages of SPAHM are demonstrated on datasets featuring a wide variation of charge and spin.The last part is devoted to application of equivariant neural networks to chemical reaction properties. EquiReact â the model proposed â predicts reaction barriers from 3D structures of reactants and products. Its high interpolative and extrapolative capabilities, particularly in the absence of atom-mapping information, are demonstrated on several datasets. Overall, the work presented in this thesis contributes to the global effort to develop, improve, and advance ML-based methods used in computational chemistry.

Official source

https://infoscience.epfl.ch/record/309349?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Encoding quantum-chemical knowledge into machine-learning models of complex molecular properties

Graph Chatbot

Chat with Graph Search

Machine learning-aided generative molecular design

Efficient and insightful descriptors for representing molecular and material space

Thermal conductivity of Li 3 PS 4 solid electrolytes with ab initio accuracy

Efficient and insightful descriptors for representing molecular and material space

Machine learning-aided generative molecular design

Thermal conductivity of Li 3 PS 4 solid electrolytes with ab initio accuracy