posted on 2018-10-26, 00:00authored byJacob
C. Ezerski, Margaret S. Cheung
We introduce the
combinatorial averaged transient structure (CATS)
clustering method as a means to cluster protein structure ensembles
based on the distributions of protein backbone descriptor coordinates.
In our study, we use phi and psi dihedral angle coordinates of the
protein backbone as descriptors due to their translational and rotational
invariance. The CATS method was developed to produce unique structure
ensembles that are typically difficult to obtain from flat energy
landscapes using a one-dimensional separation value (e.g., RMSD cutoff).
Through the use of higher-dimensional descriptor coordinates, we remedy
structure resolution shortcomings of standard clustering algorithms
due to large RMSD fluctuations between structures. We compare the
performance of CATS to an RMSD-based clustering method GROMOS, which
may not be the best choice for IDP clustering since separation quality
heavily relies on cutoff values instead of energy landscape minima.
We demonstrate the performance of CATS and GROMOS by analyzing the
all-atom molecular dynamics trajectories of the Tau/R2(273–284)
fragment in solution with TMAO and urea osmolytes from prior studies.
Our study reveals that the CATS method produces more unique clusters
than the GROMOS method as a result of higher-dimensional distributions
of the descriptor coordinates. The cluster centers produced by CATS
correspond to local minima in the multidimensional potential mean
force, which generates a structure ensemble that adequately samples
the energy landscape.