American Chemical Society
ci3c00744_si_001.pdf (749.83 kB)

LISTER: Semiautomatic Metadata Extraction from Annotated Experiment Documentation in eLabFTW

Download (749.83 kB)
journal contribution
posted on 2023-09-29, 15:37 authored by Fathoni A. Musyaffa, Kirsten Rapp, Holger Gohlke
The availability of scientific methods, code, and data is key for reproducing an experiment. Research data should be made available following the FAIR principle (findable, accessible, interoperable, and reusable). For that, the annotation of research data with metadata is central. However, existing research data management workflows often require that metadata be created by the corresponding researchers, which takes effort and time. Here, we developed LISTER as a methodological and algorithmic solution to create and extract metadata from annotated, template-based experimental documentation using minimum effort. We focused on tailoring the integration between existing platforms by using eLabFTW as the electronic lab notebook and adopting the ISA (investigation, study, assay) model as the abstract data model framework. LISTER consists of four components: annotation language to support metadata extraction; customized eLabFTW entries using specific hierarchies, templates, and tags to structure reusable scientific documentation; a “container” concept in eLabFTW, making metadata of a particular container content extractable along with its underlying, related experiments via a single click; a Python-based app to enable easy-to-use, semiautomated metadata extraction from eLabFTW entries. LISTER outputs metadata in machine-readable .json and human-readable .xlsx formats, and Material and Methods (MM) descriptions in .docx format that could be used in a thesis or manuscript. The metadata can be used as a basis to create or extend ontologies, which, when applied to the published research data, will significantly enhance its value. DSpace is used as a data cataloging platform for hosting the extracted metadata and research data. We applied LISTER to computational biophysical chemistry, protein biochemistry, and molecular biology, and our concept should be extendable to other life science areas.