posted on 2023-12-18, 20:00authored byMichael
E. Deagen, Bérenger Dalle-Cort, Nathan J. Rebello, Tzyy-Shyang Lin, Dylan J. Walsh, Bradley D. Olsen
The representation of chemical structure
forms a core component
of polymer science, yet the chemical structure diagrams used to convey
such information lack the machine processability vital for automating
analysis, managing abundant data, and harnessing the potential of
informatics. On the other hand, the usage of BigSMILES languagea
machine-readable representation of polymer chemical structurerequires
specialized knowledge of its grammar and syntax. Here, the algorithmic
translation between chemical structure diagrams and BigSMILES line
notation is demonstrated, providing seamless interconversion to and
from the lingua franca of polymer chemists across
a broad array of polymer architectures (e.g., copolymers, graft and
segmented polymers, star polymers, macrocycles, networks, ladder polymers).
Serialization from structure diagram into BigSMILES line notation
is accomplished by parsing the contents of a connection table and
iteratively assembling string representations of the molecular graph
and its substructures. Deserialization from BigSMILES line notation
into a structure diagram involves parsing the line notation string
into a stochastic graph representation, from which a valid graph traversal
defines a representative sequence of substructural units comprising
the connection table (i.e., structure diagram). These algorithms were
validated through round-trip translation on a curated set of 300 polymer
structure diagrams, demonstrating semantic preservation of the molecular
graph in over 99% of cases and visually equivalent structure diagrams
in 38% of cases. The 2D layout, an isometry of the atomic coordinates
generated by the CoordGen library within RDKit, shows the applicability
of readily available atomic layout generation algorithms while revealing
specific areas in which to improve these layout algorithms for polymersfor
example, 60% of test cases could be rectified by orienting backbone
atoms in an extended configuration along a horizontal axis. Implemented
in JavaScript, this software offers facile integration with web-based
resources and forms an essential interface between informatics and
the broader polymer research community. By enabling humans and machines
to process vast amounts of polymer chemical structural data, this
work aims to democratize access to polymer informatics and foster
increasingly interdisciplinary approaches to polymer research.