Kernel Methods and Similarity Learning Applied in Computational Chemistry

Raimon Fabregat I De Aguilar-Amat
2022
Thèse EPFL

Résumé

Over the last two decades, data-powered machine learning (ML) tools have profoundly transformed numerous scientific fields. In computational chemistry, machine learning applications have permitted faster predictions of chemical properties and provided powerful analytical tools, facilitating the exploration of the chemical space. The original work presented in this thesis leverages the paradigm-shifting influence of ML and focuses on bridging the divide between unsupervised and supervised learning with the overarching objective of improving the predictive power of similarity-based machine learning algorithms such as kernel regression.Despite their widespread use in chemistry, current implementations of kernel regression suffer from biased definitions of similarity between chemical environments. This problem originates from the rigidity of current numerical approaches for encoding molecular information, based on expert-crafted representations. Moreover, it is amplified by the incorrect (yet generalized) assumption that increasing the amount of information encoded in molecular representations unequivocally improves the evaluation of molecular similarity. As a result, the performance of kernel models can be sub-optimal reducing their broad applicability. To overcome such limitations, we introduce a series of statistical tools and methodologies based on supervised dimensionality reduction and metric learning capable of filtering and adapting the features of common molecular representations. This allows tailoring the notion of "molecular similarity" in order to optimize the prediction of specific chemical targets.Using examples such as the exploration of the free-energy landscape of oligopeptides or the prediction of subtle properties associated with the outcome of chemical reactions (for example, enantiomeric excess), we demonstrate how the methods proposed in this thesis unlock the optimal performance of kernel regression and, more generally, of any similarity-based algorithm. Overall, the work within is part of a larger, more comprehensive effort aimed at extending the capabilities of computational modeling to increasingly complex chemical situations by exploiting the latest advances in statistical learning.

Source officielle

https://infoscience.epfl.ch/record/291833?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Kernel Methods and Similarity Learning Applied in Computational Chemistry

Graph Chatbot

Chattez avec Graph Search

Topics in statistical physics of high-dimensional machine learning

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Robust machine learning for neuroscientific inference

Topics in statistical physics of high-dimensional machine learning

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Robust machine learning for neuroscientific inference