posted on 2020-11-05, 06:44authored byXin Yi See, Xuelan Wen, T. Alexander Wheeler, Channing K. Klein, Jason D. Goodpaster, Benjamin R. Reiner, Ian A. Tonks
The
rational design of catalysts remains a challenging endeavor
within the broader chemical community owing to the myriad variables
that can affect key bond-forming events. Designing selective catalysts
for any reaction requires an efficient strategy for discovering predictive
structure–activity relationships. Herein, we describe the use
of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of
2,5-dimethyl-1,3,4-triphenyl-1H-pyrrole (C) via a Ti-catalyzed formal [2 + 2 +1] cycloaddition of phenylpropyne
and azobenzene was targeted as a proof of principle. The initial reaction
conditions led to an unselective mixture of all possible pyrrole regioisomers.
ISPCA was conducted on a training set of catalysts, and their performance
was regressed against the scores from the top three principal components.
Component loadings from this PCA space and k-means
clustering were used to inform the design of new test catalysts. The
selectivity of a prospective test set was predicted in silico using the ISPCA model, and optimal candidates were synthesized and
tested experimentally. This data-driven predictive-modeling workflow
was iterated, and after only three generations the catalytic selectivity
was improved from 0.5 (statistical mixture of products) to over 11
(>90% C) by incorporating 2,6-dimethyl-4-(pyrrolidin-1-yl)pyridine
as a ligand. The origin of catalyst selectivity was probed by examining
ISPCA variable loadings in combination with DFT modeling, revealing
that ligand lability plays an important role in selectivity. A parallel
catalyst search using multivariate linear regression (MLR), a popular
approach in catalysis informatics, was also conducted in order to
compare these strategies in a hypothetical catalyst scouting campaign.
ISPCA appears to be more robust and predictive than MLR when sparse
training sets are used that are representative of the data available
during the early search for an optimal catalyst. The successful development
of a highly selective catalyst without resorting to long, stochastic
screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general
use of ISPCA in reaction development.