Amino Acid Sequence Determination of Protein Biomarkers of Campylobacter upsaliensis and C. helveticus by “Composite” Sequence Proteomic Analysis

Clifton K. Fagerquist
We have identified the protein biomarkers observed in the matrix-assisted laser desorption/ionization time-of-flight mass spectra (MALDI-TOF−MS) of cell lysates of five strains of Campylobacter upsaliensis and one strain of C. helveticus by “bottom-up” proteomic techniques. Only one C. upsaliensis strain had previously been genomically sequenced. The significant findings are as follows:  (1) The protein biomarkers identified were:  10 kD chaperonin, protein of unknown function (DUF465), phnA protein, probable periplasmic protein, d-methionine-binding lipoprotein MetQ, cytochrome c family protein, DNA-binding protein HU, thioredoxin, asparigenase family protein, helix-turn-helix domain protein, as well as several ribosomal and conserved hypothetical proteins. (2) Amino acid substitutions in protein biomarkers across species and strains account for variations in biomarker ion mass-to-charge (m/z). (3) The most common post-translational modifications (PTMs) identified were cleavage of N-terminal methionine and N-terminal signal peptides. The rule that predicts N-terminal methionine cleavage, based on the penultimate residue, does not appear to apply to C. upsaliensis proteins when the penultimate residue is threonine. (4) It was discovered that some protein biomarker genes of the genomically sequenced C. upsaliensis strain were found to have nucleotide sequences with GTG or TTG “start” codons that were not the actual start codon (ATG) of the protein based on proteomic analysis. (5) Proteomic identification of the protein biomarkers of the non-genomically sequenced C. upsaliensis and C. helveticus strains involved identification of homologous protein amino acid sequences to that of the sequenced strain. Interestingly, some protein sequence regions that were not completely homologous to the sequenced strain, due to amino acid substitutions, were found to have homologous sequence regions from more phyogenetically distant species/strains, e.g., C. jejuni. Exploiting this partial homology of more distant species/strains, it was possible to construct a “composite” amino acid sequence using multiple non-overlapping sequence regions from both phylogenetically proximate and distant strains. The new composite sequence was confirmed by both MS and MS/MS data. Thus, it was possible in some cases to determine the amino acid sequence of an unknown protein biomarker from a genomically non-sequenced bacterial strain without the necessity of either genetically sequencing the biomarker gene or resorting to de novo MS/MS analysis of the full protein sequence. Keywords: Campylobacter upsaliensishelveticus • MALDI-TOF-MS • composite sequence • proteomics • post-translational modification • bacterial classification • foodborne pathogen • protein biomarkers