posted on 2025-01-12, 19:03authored byNadezhda Yu. Biziukova, Anastasia V. Rudik, Alexander V. Dmitriev, Olga A. Tarasova, Dmitry A. Filimonov, Vladimir V. Poroikov
Understanding the biotransformation of xenobiotics in
the human
body is critical for a comprehensive assessment of drug effects since
pharmacologically active drug metabolites may exhibit a range of biological
effects that often differ from those of the original pharmaceutical
agent. Studies of the biotransformation mechanisms of xenobiotics
have resulted in numerous publications. Extracting information about
the parent compounds (substrates) and their metabolites from the texts
allows retrieval of information on their biological activities, molecular
mechanisms of action, and toxicity. Manual curation of the names of
xenobiotics, their metabolites, and biotransformation reactions in
the text is a challenging task due to the large number of publications
related to studies of pharmaceutical agents metabolism. Our aim is
to create an annotated corpus of texts that can be used for automated
extraction of the names of xenobiotics, including pharmaceutical agents
that undergo biotransformation and their metabolites. Prior to manual
annotation of the corpus, semiautomatic annotation was carried out
based on the earlier developed rule-based method for parent compounds
and their metabolites extraction. To create XenoMet, we automatically
extracted relevant texts from PubMed using a query based on MeSH terms.
The names of biotransformation reactions were recognized by using
an in-house-developed dictionary. Then, we manually verified the extracted
data by correcting errors in the named entity annotation and identified
the associations between substrates and metabolites. We tested the
applicability of XenoMet for the reconstruction of a metabolic tree
and for the automated extraction of the chemical names of substrates,
metabolites, and reactions of biotransformation. Classification of
the named entities of metabolites, substrates, and biotransformation
reactions by a conditional random fields approach using XenoMet as
the training set provides an F1-score of 0.79.