posted on 2023-04-17, 17:35authored byNavjeet Ahalawat, Mohammad Sahil, Jagannath Mondal
A long-standing target in elucidating the biomolecular
recognition
process is the identification of binding-competent conformations of
the receptor protein. However, protein conformational plasticity and
the stochastic nature of the recognition processes often preclude
the assignment of a specific protein conformation to an individual
ligand-bound pose. Here, we demonstrate that a computational framework
coined as RF-TICA-MD, which integrates an ensemble decision-tree-based
Random Forest (RF) machine learning (ML) technique with an unsupervised
dimension reduction approach time-structured independent component
analysis (TICA), provides an efficient and unambiguous solution toward
resolving protein conformational plasticity and the substrate binding
process. In particular, we consider multimicrosecond-long molecular
dynamics (MD) simulation trajectories of a ligand recognition process
in solvent-inaccessible cavities of archetypal proteins T4 lysozyme
and cytochrome P450cam. We show that in a scenario in which clear
correspondence between protein conformation and binding-competent
macrostates could not be obtained via an unsupervised dimension reduction
approach, an a priori decision-tree-based supervised
classification of the simulated recognition trajectories via RF would
help characterize key amino acid residue pairs of the protein that
are deemed sensitive for ligand binding. A subsequent unsupervised
dimensional reduction of the selected residue pairs via TICA would
then delineate a conformational landscape of protein which is able
to demarcate ligand-bound poses from unbound ones. The proposed RF-TICA-MD
approach is shown to be data agnostic and found to be robust when
using other ML-based classification methods such as XGBoost. As a
promising spinoff of the protocol, the framework is found to be capable
of identifying distal protein locations which would be allosterically
important for ligand binding and would characterize their roles in
recognition pathways. A Python implementation of a proposed ML workflow
is available in GitHub https://github.com/navjeet0211/rf-tica-md.