Random Forest: A Classification and Regression Tool for Compound Classification and
QSAR Modeling
Vladimir Svetnik
Andy Liaw
Christopher Tong
J. Christopher Culberson
Robert P. Sheridan
Bradley P. Feuston
10.1021/ci034160g.s001
https://acs.figshare.com/articles/dataset/Random_Forest_A_Classification_and_Regression_Tool_for_Compound_Classification_and_QSAR_Modeling/7944887
A new classification and regression tool, Random Forest, is introduced and investigated for predicting a
compound's quantitative or categorical biological activity based on a quantitative description of the
compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression
trees created by using bootstrap samples of the training data and random feature selection in tree induction.
Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. We built
predictive models for six cheminformatics data sets. Our analysis demonstrates that Random Forest is a
powerful tool capable of delivering performance that is among the most accurate methods to date. We also
present three additional features of Random Forest: built-in performance assessment, a measure of relative
importance of descriptors, and a measure of compound similarity that is weighted by the relative importance
of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired
features that makes Random Forest uniquely suited for modeling in cheminformatics.
2003-11-24 00:00:00
cheminformatics data sets
Random Forest
QSAR
compound