posted on 2016-11-23, 00:00authored byRobert P. Sheridan, Wei Min Wang, Andy Liaw, Junshui Ma, Eric M. Gifford
In
the pharmaceutical industry it is common to generate many QSAR
models from training sets containing a large number of molecules and
a large number of descriptors. The best QSAR methods are those that
can generate the most accurate predictions but that are not overly
expensive computationally. In this paper we compare eXtreme Gradient
Boosting (XGBoost) to random forest and single-task deep neural nets
on 30 in-house data sets. While XGBoost has many adjustable parameters,
we can define a set of standard parameters at which XGBoost makes
predictions, on the average, better than those of random forest and
almost as good as those of deep neural nets. The biggest strength
of XGBoost is its speed. Whereas efficient use of random forest requires
generating each tree in parallel on a cluster, and deep neural nets
are usually run on GPUs, XGBoost can be run on a single CPU in less
than a third of the wall-clock time of either of the other methods.