posted on 2016-06-20, 00:00authored byKieaibi
E. Wilcox, Ewan W. Blanch, Andrew J. Doig
Infrared
(IR) spectra contain substantial information about protein
structure. This has previously most often been exploited by using
known band assignments. Here, we convert spectral intensities in bins
within Amide I and II regions to vectors and apply machine learning
methods to determine protein secondary structure. Partial least squares
was performed on spectra of 90 proteins in H2O. After preprocessing
and removal of outliers, 84 proteins were used for this work. Standard
normal variate and second-derivative preprocessing methods on the
combined Amide I and II data generally gave the best performance,
with root-mean-square values for prediction of ∼12% for α-helix,
∼7% for β-sheet, 7% for antiparallel β-sheet, and
∼8% for other conformations. Analysis of Fourier transform
infrared (FTIR) spectra of 16 proteins in D2O showed that
secondary structure determination was slightly poorer than in H2O. Interval partial least squares was used to identify the
critical regions within spectra for secondary structure prediction
and showed that the sides of bands were most valuable, rather than
their peak maxima. In conclusion, we have shown that multivariate
analysis of protein FTIR spectra can give α-helix, β-sheet,
other, and antiparallel β-sheet contents with good accuracy,
comparable to that of circular dichroism, which is widely used for
this purpose.