ci2c01073_si_001.pdf (832.82 kB)
Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
journal contribution
posted on 2022-12-02, 03:34 authored by Derek van Tilborg, Alisa Alenicheva, Francesca GrisoniMachine learning has become a crucial tool in drug discovery
and
chemistry at large, e.g., to predict molecular properties,
such as bioactivity, with high accuracy. However, activity cliffspairs
of molecules that are highly similar in their structure but exhibit
large differences in potencyhave received limited attention
for their effect on model performance. Not only are these edge cases
informative for molecule discovery and optimization but also models
that are well equipped to accurately predict the potency of activity
cliffs have increased potential for prospective applications. Our
work aims to fill the current knowledge gap on best-practice machine
learning methods in the presence of activity cliffs. We benchmarked
a total of 24 machine and deep learning approaches on curated bioactivity
data from 30 macromolecular targets for their performance on activity
cliff compounds. While all methods struggled in the presence of activity
cliffs, machine learning approaches based on molecular descriptors
outperformed more complex deep learning methods. Our findings highlight
large case-by-case differences in performance, advocating for (a)
the inclusion of dedicated “activity-cliff-centered”
metrics during model development and evaluation and (b) the development
of novel algorithms to better predict the properties of activity cliffs.
To this end, the methods, metrics, and results of this study have
been encapsulated into an open-access benchmarking platform named
MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing
the pressing but overlooked limitation of molecular machine learning
models posed by activity cliffs.
History
Usage metrics
Categories
Keywords
received limited attentionmolecular descriptors outperformededge cases informativedeep learning approachescurrent knowledge gapcommunity toward addressing30 macromolecular targetsdedicated “ activitymolecular machine learningactivity cliff estimationactivity cliff compoundshttps :// githubcurated bioactivity datacentered ” metricsexhibit large differencespredict molecular propertiesactivity cliffscase differences24 machinebetter predictaccurately predictwork aimswell equippedprospective applicationsoverlooked limitationnovel algorithmsmolecule discoveryincreased potentialhighly similarhigh accuracyg .drug discoverycrucial toolalso models