posted on 2021-08-03, 13:40authored bySamuel Genheden, Ola Engkvist, Esben Bjerrum
We present a novel algorithm to compute
the distance between synthetic
routes based on tree edit distances. Such distances can be used to
cluster synthesis routes generated using a retrosynthesis prediction
tool. We show that the clustering of selected routes from a retrosynthesis
analysis is performed in less than 10 s on average and only constitutes
seven percent of the total time (prediction + clustering). Furthermore,
we are able to show that representative routes from each cluster can
be used to reduce the set of predicted routes. Finally, we show with
a number of examples that the algorithm gives intuitive clusters that
can be easily rationalized and that the routes in a cluster tend to
use similar chemistry. The algorithm is included in the latest version
of open-source AiZynthFinder software (https://github.com/MolecularAI/aizynthfinder) and as a separate package (https://github.com/MolecularAI/route-distances).