Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
The creation and maintenance of crystallographic data repositories is one of the greatest data-related achievements in chemistry. Platforms such as the Cambridge Structural Database host what is likely the most diverse collection of synthesizable molecules. If properly mined, they could be the basis for the large-scale exploration of new regions of the chemical space using quantum chemistry (QC). Yet, it is currently challenging to retrieve all the necessary information for QC codes based exclusively on the available structural data, especially for transition metal complexes. To overcome this limitation, we present cell2mol, a software that interprets crystallographic data and retrieves the connectivity and total charge of molecules, including the oxidation state (OS) of metal atoms. We demonstrate that cell2mol outperforms other popular methods at assigning the metal OS, while offering a comprehensive interpretation of the unit cell. The code is made available, as well as reliable QC-ready databases totaling 31k transition metal complexes and 13k ligands that contain incomparable chemical diversity.
Rosario Scopelliti, Shiori Fujimori
Qian Wang, Jieping Zhu, Dan Forster, Weisi Guo