posted on 2019-08-20, 20:29authored byJamie
R. Nuñez, Sean M. Colby, Dennis G. Thomas, Malak M. Tfaily, Nikola Tolic, Elin M. Ulrich, Jon R. Sobus, Thomas O. Metz, Justin G. Teeguarden, Ryan S. Renslow
The
current gold standard for unambiguous molecular identification
in metabolomics analysis is comparing two or more orthogonal properties
from the analysis of authentic reference materials (standards) to
experimental data acquired in the same laboratory with the same analytical
methods. This represents a significant limitation for comprehensive
chemical identification of small molecules in complex samples. The
process is time consuming and costly, and the majority of molecules
are not yet represented by standards. Thus, there is a need to assemble
evidence for the presence of small molecules in complex samples through
the use of libraries containing calculated chemical properties. To
address this need, we developed a Multi-Attribute Matching Engine
(MAME) and a library derived in part from our in silico chemical library engine (ISiCLE). Here, we describe an initial evaluation
of these methods in a blinded analysis of synthetic chemical mixtures
as part of the U.S. Environmental Protection Agency’s (EPA)
Non-Targeted Analysis Collaborative Trial (ENTACT, Phase 1). For molecules
in all mixtures, the initial blinded false negative rate (FNR), false
discovery rate (FDR), and accuracy were 57%, 77%, and 91%, respectively.
For high evidence scores, the FDR was 35%. After unblinding of the
sample compositions, we optimized the scoring parameters to better
exploit the available evidence and increased the accuracy for molecules
suspected as present. The final FNR, FDR, and accuracy were 67%, 53%,
and 96%, respectively. For high evidence scores, the FDR was 10%.
This study demonstrates that multiattribute matching methods in conjunction
with in silico libraries may one day enable reduced
reliance on experimentally derived libraries for building evidence
for the presence of molecules in complex samples.