posted on 2018-08-10, 00:00authored byKun-Hsing Yu, Tsung-Lu Michael Lee, Yu-Ju Chen, Christopher Ré, Samuel C. Kou, Jung-Hsien Chiang, Michael Snyder, Isaac S. Kohane
Targeted metabolomics and biochemical
studies complement the ongoing
investigations led by the Human Proteome Organization (HUPO) Biology/Disease-Driven
Human Proteome Project (B/D-HPP). However, it is challenging to identify
and prioritize metabolite and chemical targets. Literature-mining-based
approaches have been proposed for target proteomics studies, but text
mining methods for metabolite and chemical prioritization are hindered
by a large number of synonyms and nonstandardized names of each entity.
In this study, we developed a cloud-based literature mining and summarization
platform that maps metabolites and chemicals in the literature to
unique identifiers and summarizes the copublication trends of metabolites/chemicals
and B/D-HPP topics using Protein Universal Reference Publication-Originated
Search Engine (PURPOSE) scores. We successfully prioritized metabolites
and chemicals associated with the B/D-HPP targeted fields and validated
the results by checking against expert-curated associations and enrichment
analyses. Compared with existing algorithms, our system achieved better
precision and recall in retrieving chemicals related to B/D-HPP focused
areas. Our cloud-based platform enables queries on all biological
terms in multiple species, which will contribute to B/D-HPP and targeted
metabolomics/chemical studies.