mp100279d_si_002.xls (89 kB)

Combinatorial QSAR Modeling of Human Intestinal Absorption

Download (89 kB)
posted on 07.02.2011, 00:00 by Claudia Suenderhauf, Felix Hammann, Andreas Maunz, Christoph Helma, Jörg Huwyler
Intestinal drug absorption in humans is a central topic in drug discovery. In this study, we use a broad selection of machine learning and statistical methods for the classification and numerical prediction of this key end point. Our data set is based on a selection of 458 small druglike compounds with FDA approval. Using easily available tools, we calculated one- to three-dimensional physicochemical descriptors and used various methods of feature selection (best-first backward selection, correlation analysis, and decision tree analysis). We then used decision tree induction (DTI), fragment-based lazy-learning (LAZAR), support vector machine classification, multilayer perceptrons, random forests, k-nearest neighbor and Naïve Bayes analysis to model absorption ratios and binary classification (well-absorbed and poorly absorbed compounds). Best performance for classification was seen with DTI using the chi-squared analysis interaction detector (CHAID) algorithm, yielding corrected classification rate of 88% (Matthews correlation coefficient of 75%). In numeric predictions, the multilayer perceptron performed best, achieving a root mean squared error of 25.823 and a coefficient of determination of 0.6. In line with current understanding is the importance of descriptors such as lipophilic partition coefficients (log P) and hydrogen bonding. However, we are able to highlight the utility of gravitational indices and moments of inertia, reflecting the role of structural symmetry in oral absorption. Our models are based on a diverse data set of marketed drugs representing a broad chemical space. These models therefore contribute substantially to the molecular understanding of human intestinal drug absorption and qualify for a generalized use in drug discovery and lead optimization.