We
developed Material Graph Digitizer (MatGD), which is a tool
for digitizing a data line from scientific graphs. The algorithm behind
the tool consists of four steps: (1) identifying graphs within subfigures,
(2) separating axes and data sections, (3) discerning the data lines
by eliminating irrelevant graph objects and matching with the legend,
and (4) data extraction and saving. From the 62,534 papers in the
areas of batteries, catalysis, and metal–organic frameworks
(MOFs), 501,045 figures were mined. Remarkably, our tool showcased
performance with over 99% accuracy in legend marker and text detection.
Moreover, its capability for data line separation stood at 66%, which
is much higher compared to those of other existing figure-mining tools.
We believe that this tool will be integral to collecting both past
and future data from publications, and these data can be used to train
various machine learning models that can enhance material predictions
and new materials discovery.