Introduction
Crystal structure prediction is an area of active research in chemistry and physics. A crystal structure is just the arrangement of atoms within a crystalline solid. For instance, table salt:In general, there is no deterministic way to predict the atomic structure for a given elemental composition at particular thermodynamic conditions. Programs that can reliably predict crystal structures with given properties have the potential to save researchers millions of lab hours (not to mention dollars!).
Of course, a competent structural chemist has a number of heuristic rules for guessing crystal structures. Most of these are inspired by trends in the periodic table. Could a (relatively) simple computer program with examples of real crystal structures and knowledge of the periodic table come up with some similar heuristics on its own?
The Materials Project is an open database of real and computed solid-state structures. I built a program to download the structural details from their database (tens of thousands of entries) and predict structural characteristics of unseen structures using random forests. The approach is very simple, but works surprisingly well for some properties. The code is available on github along with some more in depth explanation (in .pdf form).
The Data
The compounds in the dataset are largely cubic, though all 32 crystallographic point groups are represented.Results
I didn’t try to predict the exact crystal structures. Instead, I chose a few parameters that could be combined to generate a reasonable guess: point group, coordination (number of neighbors for each atom), irregularity of axes (ratio of c axis to a axis), and volume per site. I used random forest classification and regression for all tasks.Point group prediction had an accuracy of around 45%. If the crystal system was already guessed correctly, the point group prediction was much better (over 80%). This is a bit technical, but if you’re interested, you can read more about it (and look at the confusion matrices) in the pdf.
The program was very good at predicting volume per site, however it should be noted that many of the structures were metal alloys: transition metals tend to retain their atomic properties when mixed with other metals, so it might already be easy to guess the volume per site in many cases. All the following plots are on the test set.
The c/a ratio did not fare as well, though there may be some signal here:
Finally, the coordination numbers. This is the number of neighboring atoms each atom is bound to. I don’t know how these numbers were decided (coordination number is a somewhat subjective quantity), but the program can make some decent predictions at any rate. Coordination numbers commonly range from 1 to 12, or higher. Notice the mean average error for many elements is on the order of 0.5.
No comments:
Post a Comment