Automated Glycan Sequencing from Tandem Mass Spectra of N‑Linked Glycopeptides
datasetposted on 23.05.2016, 19:19 by Chuan-Yih Yu, Anoop Mayampurath, Rui Zhu, Lauren Zacharias, Ehwang Song, Lei Wang, Yehia Mechref, Haixu Tang
Mass spectrometry has become a routine experimental tool for proteomic biomarker analysis of human blood samples, partly due to the large availability of informatics tools. As one of the most common protein post-translational modifications (PTMs) in mammals, protein glycosylation has been observed to alter in multiple human diseases and thus may potentially be candidate markers of disease progression. While mass spectrometry instrumentation has seen advancements in capabilities, discovering glycosylation-related markers using existing software is currently not straightforward. Complete characterization of protein glycosylation requires the identification of intact glycopeptides in samples, including identification of the modification site as well as the structure of the attached glycans. In this paper, we present GlycoSeq, an open-source software tool that implements a heuristic iterated glycan sequencing algorithm coupled with prior knowledge for automated elucidation of the glycan structure within a glycopeptide from its collision-induced dissociation tandem mass spectrum. GlycoSeq employs rules of glycosidic linkage as defined by glycan synthetic pathways to eliminate improbable glycan structures and build reasonable glycan trees. We tested the tool on two sets of tandem mass spectra of N-linked glycopeptides cell lines acquired from breast cancer patients. After employing enzymatic specificity within the N-linked glycan synthetic pathway, the sequencing results of GlycoSeq were highly consistent with the manually curated glycan structures. Hence, GlycoSeq is ready to be used for the characterization of glycan structures in glycopeptides from MS/MS analysis. GlycoSeq is released as open source software at https://github.com/chpaul/GlycoSeq/.