During the drug development process,
it is common to carry out
toxicity tests and adverse effect studies, which are essential to
guarantee patient safety and the success of the research. The use
of in silico quantitative structure–activity
relationship (QSAR) approaches for this task involves processing a
huge amount of data that, in many cases, have an imbalanced distribution
of active and inactive samples. This is usually termed the class-imbalance
problem and may have a significant negative effect on the performance
of the learned models. The performance of feature selection (FS) for
QSAR models is usually damaged by the class-imbalance nature of the
involved datasets. This paper proposes the use of an FS method focused
on dealing with the class-imbalance problems. The method is based
on the use of FS ensembles constructed by boosting and using two well-known
FS methods, fast clustering-based FS and the fast correlation-based
filter. The experimental results demonstrate the efficiency of the
proposal in terms of the classification performance compared to standard
methods. The proposal can be extended to other FS methods and applied
to other problems in cheminformatics.