Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The calculation of the electronic structure of chemical systems, necessitates computationally expensive approximations to the time-independent electronic Schrödinger equation in order to yield static properties in good agreement with experimental results. These methods can also be coupled with molecular dynamics, to provide a first principles description of thermodynamic properties, dynamics and chemical reactions. Evidently, the cost of the underlying electronic structure method limits the time frame over which a system can be studied, and hence, certain chemical processes may be out-of-reach using a particular method. Furthermore, when one is interested in designing new molecules with interesting properties, a systematic enumeration of an inordinately large chemical space is typically required. The combination of both expensive electronic structure calculations and large chemical spaces results in an insurmountable barrier in computational cost.
The application of artificial intelligence (AI) in computational chemistry has, over the past 20 years, seen an explosion in interest and scope with respect to these two issues. Intelligent algorithms capable of efficiently sampling chemical spaces, coupled with machine learning (ML) techniques to cheapen the calculation of electronic structure evaluations, enable both rapid throughput to search for new molecules with particular properties and in the case of ML, an increase in the timescales that can be simulated via molecular dynamics.
In this thesis, computer programs have been developed that enable the application of AI algorithms to chemical and biological problems. In particular, a versatile evolutionary algorithm toolbox called EVOLVE has been developed. As a first test-case study, genetic algorithms were used to efficiently sample the vast chemical sequence space of an isolated ¿-helical peptide, from which insights are gained to rationalise the stability of particular genetically optimised peptides in a variety of implicit solvent environments. Genetic algorithms were then applied to the compositional optimisation of training sets used in machine learning models of molecular properties. The resulting optimal training sets are shown to significantly reduce out-of-sample errors on all thermodynamic and electronic properties considered. Furthermore, they reveal that there are systematic trends in the distribution of these optimally-representative molecules. Inspired by the success of machine learning, an ML-enhanced multiple time step approach for performing accurate ab initio molecular dynamics was developed. Two schemes representing different force partitioning were investigated. In the first scheme, the ML method provides an estimation of the slow (high level) force components acting on a system, while in the second, the ML forces are added to the fast (low level) components and a high level ab initio method is used to correct the error induced by the ML model. In both schemes significant overall speedups are obtained with respect to standard Velocity-Verlet integration, all-the-while maintaining the accuracy of the high level ab initio method.
,