Machine learning
has exhibited powerful capabilities in many areas. However, machine
learning models are mostly database dependent, requiring a new model
if the database changes. Therefore, a universal model is highly desired
to accommodate the widest variety of databases. Fortunately, this
universality may be achieved by ensemble learning, which can integrate
multiple learners to meet the demands of diversified databases. Therefore,
we propose a general procedure for learning ensemble establishment
based on noncovalent interactions (NCIs) databases. Additionally,
accurate NCI computation is quite demanding for first-principles methods,
for which a competent machine learning model can be an efficient solution
to obtain high NCI accuracy with minimal computational resources.
In regard to these aspects, multiple schemes of ensemble learning
models (Bagging, Boosting, and Stacking frameworks), are explored
in this study. The models are based on various low levels of density
functional theory (DFT) calculations for the benchmark databases S66,
S22, and X40. All NCIs computed by the DFT calculations can be improved
to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol
in contrast to CCSD(T)/CBS benchmark) by established ensemble learning
models. Compared with single machine learning models, ensemble models
show better accuracy (RMSE of the best model is further lowered by ∼25%),
robustness and goodness-of-fit according to evaluation parameters
suggested by the OECD. Among ensemble learning models, heterogeneous
Stacking ensemble models show the most valuable application potential.
The standardized procedure of constructing learning ensembles has
been well utilized on several NCI data sets, and this procedure may
also be applicable for other chemical databases.