posted on 2023-12-29, 16:35authored byBaicheng Zhang, Hengyu Xiao, Guilin Ye, Zhaokun Song, Tiantian Han, Edward Sharman, Man Luo, Aoyuan Cheng, Qing Zhu, Haitao Zhao, Guoqing Zhang, Song Wang, Jun Jiang
Label-free data mining can efficiently feed large amounts
of data
from the vast scientific literature into artificial intelligence (AI)
processing systems. Here, we demonstrate an unsupervised syntactic
distance analysis (SDA) approach that is capable of mining chemical
substances, functions, properties, and operations without annotation.
This SDA approach was evaluated in several areas of research from
the physical sciences and achieved performance in information mining
comparable to that of supervised learning, as shown by its satisfactory
scores of 0.62–0.72, 0.60–0.82, and 0.86–0.95
in precision, recall, and accuracy, respectively. We also showcase
how our approach can assist robotic chemists programmed to perform
research focused on double-perovskite colloidal nanocrystals, gold
colloidal nanocrystals, oxygen evolution reaction catalysts, and enzyme-like
catalysts by designing materials, formulations, and synthesis parameters
based on data mined from 1.1 million literature references.