posted on 2016-06-30, 00:00authored byMaggie P. Y. Lam, Vidya Venkatraman, Yi Xing, Edward LauEdward Lau, Quan Cao, Dominic C. M. Ng, Andrew I. Su, Junbo Ge, Jennifer E. Van Eyk, Peipei Ping
Amidst
the proteomes of human tissues lie subsets of proteins that
are closely involved in conserved pathophysiological processes. Much
of biomedical research concerns interrogating disease signature proteins
and defining their roles in disease mechanisms. With advances in proteomics
technologies, it is now feasible to develop targeted proteomics assays
that can accurately quantify protein abundance as well as their post-translational
modifications; however, with rapidly accumulating number of studies
implicating proteins in diseases, current resources are insufficient
to target every protein without judiciously prioritizing the proteins
with high significance and impact for assay development. We describe
here a data science method to prioritize and expedite assay development
on high-impact proteins across research fields by leveraging the biomedical
literature record to rank and normalize proteins that are popularly
and preferentially published by biomedical researchers. We demonstrate
this method by finding priority proteins across six major physiological
systems (cardiovascular, cerebral, hepatic, renal, pulmonary, and
intestinal). The described method is data-driven and builds upon the
collective knowledge of previous publications referenced on PubMed
to lend objectivity to target selection. The method and resulting
popular protein lists may also be useful for exploring biological
processes associated with various physiological systems and research
topics, in addition to benefiting ongoing efforts to facilitate the
broad translation of proteomics technologies.