js9b00035_si_003.xlsx (14.67 kB)
Robust Accurate Identification and Biomass Estimates of Microorganisms via Tandem Mass Spectrometry
datasetposted on 2020-09-10, 15:07 authored by Gelio Alves, Yi-Kuo Yu
Rapid and accurate identification of microorganisms and estimation of their biomasses are of extreme importance to public health. Mass spectrometry has become an important technique for these purposes. Previously we published a workflow named Microorganism Classification and Identification (MiCId v.12.26.2017) that was shown to perform no worse than other workflows. This manuscript presents MiCId v.12.13.2018 that, in comparison with the earlier version v.12.26.2017, allows for biomass estimates, provides more accurate microorganism identifications (better controls the number of false positives), and is robust against database size increase. This significant advance is made possible by several new ingredients introduced: first, we apply a modified expectation-maximization method to compute for each taxon considered a prior probability, which can be used for biomass estimate; second, we introduce a new concept called ownership, through which the participation ratio is computed and use it as the number of taxa to be kept within a cluster of closely related taxa; third, based on confidently identified peptides, we calculate for each taxon its degree of independence from the rest of taxa considered to determine whether or not to split this taxon off the cluster. Using 270 data files, each containing a large number of MS/MS spectra, we show that, in comparison with v.12.26.2017, version v.12.13.2018 yields superior retrieval results. We also show that MiCId v.12.13.2018 can estimate species biomass reasonably well. The new MiCId v.12.13.2018, designed to run in Linux environment, is freely available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
https :// wwwfalse positives ),database size increaseconfidently identified peptidesaccurate microorganism identificationsclosely related taxanew micid vearlier version vrobust accurate identificationaccurate identificationversion vmicid vworkflow namedtaxa consideredsignificant advancepublic healthprior probabilityparticipation ratiomodified expectationmi maximization methodmass spectrometrymade possiblelinux environmentkept withinimportant techniqueid freely availableextreme importancedetermine whetherc biomass estimatesbiomass estimatebetter controls