Search for a command to run...
Pair distribution function (PDF) is a powerful tool for structural investigation of materials which extends the analysis of total scattering data beyond conventional crystallography. While the atomic arrangement is directly encoded in the PDF, initial structural model is often required to get a detailed insight into the material structure. One area that leverages the PDF capabilities is structural investigation of nanoparticles and polynuclear complexes. If crystal structures of the latter are well understood, they can be a convenient starting point for studies of the wider range of compounds. Utilizing deep learning capabilities of multi-layer convolutional neural networks (CNN), we developed a model that predicts the number of heavy atoms (nuclearity) in lanthanide coordination compounds from their PDF. Aiming for explainability, we probe our approach by training classical decision tree algorithms using several thousand model binary cerium-oxygen clusters and intuitive structure-related PDF descriptors, such as prominent peak areas, peak ratios, etc. Gradually increasing the atomic structure complexity, we observe the increasing demand for the model generalization ability to provide reliable predictions. This leads us to construction of a multi-layered CNN architecture which uses whole PDFs of 645 CSD-deposited crystal structures as input vectors for training and provides mean prediction accuracy of 86%. Further, we apply our network to the experimental PDFs of lanthanide polynuclear complexes and coordination polymers to successfully identify their nuclearity and to highlight its generalization ability across the simulation-experimental domain gap. Main purpose of this work is to demonstrate that structural information can be inferred from the PDF by the machine learning algorithms after training them on calculated PDF data and to provide transparent methodology of such inference. Our study widens the general perspective on extracting the structural information from the PDF in absence of a structural model and opens the prospects of using machine learning tools for a wider scope of tasks in chemistry and materials science.