posted on 2012-11-06, 00:00authored byLochana
C. Menikarachchi, Shannon Cawley, Dennis W. Hill, L. Mark Hall, Lowell Hall, Steven Lai, Janine Wilder, David F. Grant
In this paper, we present MolFind, a highly multithreaded
pipeline
type software package for use as an aid in identifying chemical structures
in complex biofluids and mixtures. MolFind is specifically designed
for high-performance liquid chromatography/mass spectrometry (HPLC/MS)
data inputs typical of metabolomics studies where structure identification
is the ultimate goal. MolFind enables compound identification by matching
HPLC/MS-based experimental data obtained for an unknown compound with
computationally derived HPLC/MS values for candidate compounds downloaded
from chemical databases such as PubChem. The downloaded “bins”
consist of all compounds matching the monoisotopic molecular weight
of the unknown. The computational HPLC/MS values predicted include
retention index (RI), ECOM50 (energy required to fragment
50% of a selected precursor ion), drift time, and collision induced
dissociation (CID) spectrum. RI, ECOM50, and drift-time
models are used for filtering compounds downloaded from PubChem. The
remaining candidates are then ranked based on CID spectra matching.
Current RI and ECOM50 models allow for the removal of about
28% of compounds from PubChem bins. Our estimates suggest that this
could be improved to as much as 87% with additional chemical structures
included in the computational models. Quantitative structure property
relationship-based modeling of drift times showed a better correlation
with experimentally determined drift times than did Mobcal cross-sectional
areas. In 23 of 35 example cases, filtering PubChem bins with RI and
ECOM50 predictive models resulted in improved ranking of
the unknown compounds compared to previous studies using CID spectra
matching alone. In 19 of 35 examples, the correct candidate was ranked
within the top 20 compounds in bins containing an average of 1635
compounds.