posted on 2022-11-28, 22:14authored byDeborah Palazzotti, Martina Fiorelli, Stefano Sabatini, Serena Massari, Maria Letizia Barreca, Andrea Astolfi
The recent increase of bioactivity data freely available
to the
scientific community and stored as activity data points in chemogenomic
repositories provides a huge amount of ready-to-use information to
support the development of predictive models. However, the benefits
provided by the availability of such a vast amount of accessible information
are strongly counteracted by the lack of uniformity and consistency
of data from multiple sources, requiring a process of integration
and harmonization. While different automated pipelines for processing
and assessing chemical data have emerged in the last years, the curation
of bioactivity data points is a less investigated topic, with useful
concepts provided but no tangible tools available. In this context,
the present work represents a first step toward the filling of this
gap, by providing a tool to meet the needs of end-user in building
proprietary high-quality data sets for further studies. Specifically,
we herein describe Q-raKtion, a systematic, semiautomated, flexible,
and, above all, customizable KNIME workflow that effectively aggregates
information on biological activities of compounds retrieved by two
of the most comprehensive and widely used repositories, PubChem and
ChEMBL.