posted on 2021-09-21, 19:36authored byJeanet Mante, Nicholas Roehner, Kevin Keating, James Alastair McLaughlin, Eric Young, Jacob Beal, Chris J. Myers
As an engineering endeavor, synthetic
biology requires effective
sharing of genetic design information that can be reused in the construction
of new designs. While there are a number of large community repositories
of design information, curation of this information has been limited.
This in turn limits the ways in which design information can be put
to use. The aim of this work was to improve this situation by creating
a curated library of parts from the International Genetically
Engineered Machines (iGEM) registry data set. To this end,
an analysis of the Synthetic Biology Open Language (SBOL) version of the iGEM registry was carried out using four different
approachessimple statistics, SnapGene autoannotation, SYNBICT
autoannotation, and expert analysisthe results of which are
presented herein. Key challenges encountered include the use of free
text, insufficient part provenance, part duplication, lack of part
removal, and insufficient continuous curation. On the basis of these
analyses, the focus has shifted from the creation of a curated iGEM
part library to instead the extraction of a set of lessons, which
are presented here. These lessons can be exploited to facilitate the
creation and curation of other part libraries using a simpler and
less labor intensive process.