Version 2 2020-09-10, 15:07Version 2 2020-09-10, 15:07
Version 1 2019-11-14, 17:36Version 1 2019-11-14, 17:36
dataset
posted on 2020-09-10, 15:07authored byGelio Alves, Yi-Kuo Yu
Rapid and accurate
identification of microorganisms and estimation
of their biomasses are of extreme importance to public health. Mass
spectrometry has become an important technique for these purposes.
Previously we published a workflow named Microorganism Classification and Identification (MiCId
v.12.26.2017) that was shown to perform no worse than other workflows.
This manuscript presents MiCId v.12.13.2018 that, in comparison with
the earlier version v.12.26.2017, allows for biomass estimates, provides
more accurate microorganism identifications (better controls the number
of false positives), and is robust against database size increase.
This significant advance is made possible by several new ingredients
introduced: first, we apply a modified expectation-maximization method
to compute for each taxon considered a prior probability, which can
be used for biomass estimate; second, we introduce a new concept called
ownership, through which the participation ratio is computed and use
it as the number of taxa to be kept within a cluster of closely related
taxa; third, based on confidently identified peptides, we calculate
for each taxon its degree of independence from the rest of taxa considered
to determine whether or not to split this taxon off the cluster. Using
270 data files, each containing a large number of MS/MS spectra, we
show that, in comparison with v.12.26.2017, version v.12.13.2018 yields
superior retrieval results. We also show that MiCId v.12.13.2018 can
estimate species biomass reasonably well. The new MiCId v.12.13.2018,
designed to run in Linux environment, is freely available for download
at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.