Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The creation and maintenance of crystallographic data repositories is one of the greatest data-related achievements in chemistry. Platforms such as the Cambridge Structural Database host what is likely the most diverse collection of synthesizable molecules. If properly mined, they could be the basis for the large-scale exploration of new regions of the chemical space using quantum chemistry (QC). Yet, it is currently challenging to retrieve all the necessary information for QC codes based exclusively on the available structural data, especially for transition metal complexes. To overcome this limitation, we present cell2mol, a software that interprets crystallographic data and retrieves the connectivity and total charge of molecules, including the oxidation state (OS) of metal atoms. We demonstrate that cell2mol outperforms other popular methods at assigning the metal OS, while offering a comprehensive interpretation of the unit cell. The code is made available, as well as reliable QC-ready databases totaling 31k transition metal complexes and 13k ligands that contain incomparable chemical diversity.
Rosario Scopelliti, Shiori Fujimori
Qian Wang, Jieping Zhu, Dan Forster, Weisi Guo