posted on 2023-08-04, 15:04authored byNicolai Kozlowski, Helmut Grubmüller
Markov state models
are widely used to describe and analyze protein
dynamics based on molecular dynamics simulations, specifically to
extract functionally relevant characteristic time scales and motions.
Particularly for larger biomolecules such as proteins, however, insufficient
sampling is a notorious concern and often the source of large uncertainties
that are difficult to quantify. Furthermore, there are several other
sources of uncertainty, such as choice of the number of Markov states
and lag time, choice and parameters of dimension reduction preprocessing
step, and uncertainty due to the limited number of observed transitions;
the latter is often estimated via a Bayesian approach. Here, we quantified
and ranked all of these uncertainties for four small globular test
proteins. We found that the largest uncertainty is due to insufficient
sampling and initially increases with the total trajectory length T up to a critical tipping point, after which it decreases
as 1/T, thus providing guidelines for how much
sampling is required for given accuracy. We also found that single
long trajectories yielded better sampling accuracy than many shorter
trajectories starting from the same structure. In comparison, the
remaining sources of the above uncertainties are generally smaller
by a factor of about 5, rendering them less of a concern but certainly
not negligible. Importantly, the Bayes uncertainty, commonly used
as the only uncertainty estimate, captures only a relatively small
part of the true uncertainty, which is thus often drastically underestimated.