posted on 2017-03-07, 00:00authored byKaren Sargsyan, Cédric Grauffel, Carmay Lim
The
root-mean-square deviation (RMSD) is a similarity measure widely
used in analysis of macromolecular structures and dynamics. As increasingly
larger macromolecular systems are being studied, dimensionality effects
such as the “curse of dimensionality” (a diminishing
ability to discriminate pairwise differences between conformations
with increasing system size) may exist and significantly impact RMSD-based
analyses. For such large bimolecular systems, whether the RMSD or
other alternative similarity measures might suffer from this “curse”
and lose the ability to discriminate different macromolecular structures
had not been explicitly addressed. Here, we show such dimensionality
effects for both weighted and nonweighted RMSD schemes. We also provide
a mechanism for the emergence of the “curse of dimensionality”
for RMSD from the law of large numbers by showing that the conformational
distributions from which RMSDs are calculated become increasingly
similar as the system size increases. Our findings suggest the use
of weighted RMSD schemes for small proteins (less than 200 residues)
and nonweighted RMSD for larger proteins when analyzing molecular
dynamics trajectories.