cm6b02905_si_002.xlsx (28.93 kB)
Download fileClassifying Crystal Structures of Binary Compounds AB through Cluster Resolution Feature Selection and Support Vector Machine Analysis
dataset
posted on 2016-08-22, 00:00 authored by Anton
O. Oliynyk, Lawrence A. Adutwum, James J. Harynuk, Arthur MarPartial
least-squares discriminant analysis (PLS-DA) and support
vector machine (SVM) techniques were applied to develop a crystal
structure predictor for binary AB compounds. Models were trained and
validated on the basis of the classification of 706 AB compounds adopting
the seven most common structure types (CsCl, NaCl, ZnS, CuAu, TlI,
β-FeB, and NiAs), through data extracted from Pearson’s
Crystal Data and ASM Alloy Phase Diagram Database. Out of 56 initial
variables (descriptors based on elemental properties only), 31 were
selected in as unbiased manner as possible through a procedure of
forward selection and backward elimination, with the quality of the
model evaluated by measuring the cluster resolution at each step.
PLS-DA gave sensitivity of 96.5%, specificity of 66.0%, and accuracy
of 77.1% for the validation set data, whereas SVM gave sensitivity
of 94.2%, specificity of 92.7%, and accuracy of 93.2%, a significant
improvement. Radii, electronegativity, and valence electrons, previously
chosen intuitively in structure maps, were confirmed as important
variables. PLS-DA and SVM could also make quantitative predictions
of hypothetical compounds, unlike semiclassical approaches. The new
compound RhCd was predicted to have the CsCl-type structure by PLS-DA
(0.669 probability) and, at an even stronger confidence level, by
SVM (0.918 probability). RhCd was synthesized by reaction of the elements
at 800 °C and confirmed by X-ray diffraction to adopt the CsCl-type
structure. SVM is thus a superior classification method in crystallography
that is fast and makes correct, quantitative predictions; it may be
more broadly applicable to help identify the structure of unknown
compounds with any arbitrary composition.