ci500480b_si_001.pdf (1.24 MB)
Investigation of the Use of Spectral Clustering for the Analysis of Molecular Data
journal contribution
posted on 2015-12-17, 06:32 authored by Sonny Gan, David A. Cosgrove, Eleanor J. Gardiner, Valerie J. GilletSpectral
clustering involves placing objects into clusters based on the eigenvectors
and eigenvalues of an associated matrix. The technique was first applied
to molecular data by Brewer [J. Chem. Inf. Model. 2007, 47, 1727–1733] who demonstrated
its use on a very small dataset of 125 COX-2 inhibitors. We have
determined suitable parameters for spectral clustering using a wide
variety of molecular descriptors and several datasets of a few thousand
compounds and compared the results of clustering using a nonoverlapping
version of Brewer’s use of Sarker and Boyer’s algorithm
with that of Ward’s and k-means clustering.
We then replaced the exact eigendecomposition method with two different
approximate methods and concluded that Singular Value Decomposition
is the most appropriate method for clustering larger compound collections
of up to 100 000 compounds. We have also used spectral clustering
with the Tversky coefficient to generate two sets of clusters linked
by a common set of eigenvalues and have used this novel approach to
cluster sets of fragments such as those used in fragment-based drug
design.