ci0c00850_si_001.zip (14.83 MB)
Download fileTopological Similarity Search in Large Combinatorial Fragment Spaces
dataset
posted on 21.10.2020, 16:05 by Louis Bellmann, Patrick Penner, Matthias RareyIn similarity-driven virtual screening,
molecular fingerprints
are widely used to assess the similarity of all compounds contained
in a chemical library to a query compound of interest. This similarity
analysis is traditionally done for each member of the library individually.
When encoding chemical spaces that surpass billions of compounds in
size, it becomes impractical to enumerate all their products, let
alone assess their similarity, deeming this approach impossible without
investing a substantial amount of resources. In this work, we present
a novel search algorithm named SpaceLight for topological fingerprint
similarity searching in large, practically non-enumerable combinatorial
fragment spaces. In contrast to existing methods, SpaceLight is able
to utilize the combinatorial character of these chemical spaces for
efficiency while maintaining a high correlation of the description
of molecular similarity to well-known molecular fingerprints like
ECFP. The resulting software is able to search prominent spaces like
EnamineREAL with more than 10 billion compounds in seconds on a standard
desktop computer.