Standardizing and Simplifying Analysis of Peptide Library Data

Peptide libraries allow researchers to quickly find hundreds of peptide sequences with a desired property. Currently, the large amount of data generated from peptide libraries is analyzed by hand, where researchers search for repeating patterns in the peptide sequences. Such patterns are called motifs. In this work, we describe a set of algorithms which allow quick, efficient, and standard analysis of peptide libraries. Four main techniques are described: (1) choice of the number of motifs present in a peptide library; (2) separation of the peptides into groups of similar sequences; (3) fitting of a model to the peptides to extract motifs; (4) analysis of the library using quantitative structure–property relationships if no clear motifs are present. The application of five previously published data sets shows these techniques can automatically repeat the work of experts quickly and allow much more flexibility in analysis. A new way of visually presenting peptide libraries is also described, which allows visual inspection of the grouping and spread of sequences. The algorithms have been implemented in an open-source plug-in called “peplib” and an online web application.