FormulationBCS:
A Machine Learning Platform Based
on Diverse Molecular Representations for Biopharmaceutical Classification
System (BCS) Class Prediction
posted on 2024-12-08, 21:03authored byZheng Wu, Nannan Wang, Zhuyifan Ye, Huanle Xu, Ging Chan, Defang Ouyang
The Biopharmaceutics Classification
System (BCS) has
facilitated
biowaivers and played a significant role in enhancing drug regulation
and development efficiency. However, the productivity of measuring
the key discriminative properties of BCS, solubility and permeability,
still requires improvement, limiting high-throughput applications
of BCS, which is essential for evaluating drug candidate developability
and guiding formulation decisions in the early stages of drug development.
In recent years, advancements in machine learning (ML) and molecular
characterization have revealed the potential of quantitative structure–performance
relationships (QSPR) for rapid and accurate in silico BCS classification. The present study aims to develop a web platform
for high-throughput BCS classification based on high-performance ML
models. Initially, four data sets of BCS-related molecular properties:
log S, log P, log D, and log Papp were curated. Subsequently,
6 ML algorithms or deep learning frameworks were employed to construct
models, with diverse molecular representations ranging from one-dimensional
molecular fingerprints, descriptors, and molecular graphs to three-dimensional
molecular spatial coordinates. By comparing different combinations
of molecular representations and learning algorithms, LightGBM exhibited
excellent performance in solubility prediction, with an R2 of 0.84; AttentiveFP outperformed others in permeability
prediction, with R2 values of 0.96 and
0.76 for log P and log D, respectively;
and XGBoost was the most accurate for log Papp prediction, with an R2 of 0.71. When
externally validated on a marketed drug BCS category data set, the
best-performing models achieved classification accuracies of over
77 and 73% for solubility and permeability, respectively. Finally,
the well-trained models were embedded into the first ML-based BCS
class prediction web platform (x f), enabling pharmaceutical scientists
to quickly determine the BCS category of candidate drugs, which will
aid in the high-throughput BCS assessment for candidate drugs during
the preformulation stage, thereby promoting reduced risk and enhanced
efficiency in drug development and regulation.