posted on 2022-05-09, 16:05authored byShu Huang, Jacqueline M. Cole
A great number of
scientific papers are published every year in
the field of battery research, which forms a huge textual data source.
However, it is difficult to explore and retrieve useful information
efficiently from these large unstructured sets of text. The Bidirectional
Encoder Representations from Transformers (BERT) model, trained on
a large data set in an unsupervised way, provides a route to process
the scientific text automatically with minimal human effort. To this
end, we realized six battery-related BERT models, namely, BatteryBERT,
BatteryOnlyBERT, and BatterySciBERT, each of which consists of both
cased and uncased models. They have been trained specifically on a
corpus of battery research papers. The pretrained BatteryBERT models
were then fine-tuned on downstream tasks, including battery paper
classification and extractive question-answering for battery device
component classification that distinguishes anode, cathode, and electrolyte
materials. Our BatteryBERT models were found to outperform the original
BERT models on the specific battery tasks. The fine-tuned BatteryBERT
was then used to perform battery database enhancement. We also provide
a website application for its interactive use and visualization.