Machine
Learning in Complex Organic Mixtures: Applying
Domain Knowledge Allows for Meaningful Performance with Small Data
Sets
Posted on 2024-07-31 - 09:43
The
ability to quantify individual components of complex mixtures
is a challenge found throughout the life and physical sciences. An
improved capacity to generate large data sets along with the uptake
of machine-learning (ML)-based analysis tools has allowed for various
“omics” disciplines to realize exceptional advances.
Other areas of chemistry that deal with complex mixtures often do
not leverage these advances. Environmental samples, for example, can
be more difficult to access, and the resulting small data sets are
less appropriate for unconstrained ML approaches. Herein, we present
an approach to address this latter issue. Using a very small environmental
data set35 high-resolution mass spectra gathered from various
solvent extractions of Canadian petroleum fractionswe show
that the application of specific domain knowledge can lead to ML models
with notable performance.
CITE THIS COLLECTION
DataCiteDataCite
No result found
Le, Katelyn; Radović, Jagoš R.; MacCallum, Justin L.; Larter, Stephen R.; Van Humbeck, Jeffrey F. (2024). Machine
Learning in Complex Organic Mixtures: Applying
Domain Knowledge Allows for Meaningful Performance with Small Data
Sets. ACS Publications. Collection. https://doi.org/10.1021/jacs.4c06595