The number of mystery proteins in the human body, whose roles are unknown, could be as high as 20%, according to a new study published in Open Biology. The first author of the paper, Dr Valerie Wood from the University of Cambridge, explains more about her group’s work on uncovering the function of these mystery proteins and shares her experience of publishing in Open Biology.
Tell us about yourself and your research?
I am a biocurator at PomBase, the database for the fission yeast Schizosaccharomyces pombe. Fission yeast is used as a model of cell level biology, and a large amount of detailed experimental information is published by our community. A biocurator reads publications and translates the results from text and figure format into semantically simplified language, so that we can group ‘biological parts’ with similar attributes together. For example, we might label a protein as involved in some aspect of DNA replication, cell division, transport or metabolism. Importantly, we also filter or remove incorrect or out of date information to ensure that knowledge collected into the databases is current, and is correctly represented. In this ‘standardised format’ researchers can quickly find very specific information. The scientific community can also begin to ask questions taking advantage of the information integrated from thousands of publications and experiments. In this work we were able to use the knowledge we have curated to address a simple question: How many proteins are “unknown physiological role”.
What is your article about and what are the main points readers should take from it?
One of the fundamental aims of biological research is to characterise individual genes or proteins and establish how they contribute to a living system. Our recent paper identified persistent knowledge gaps, and a protracted slowdown in the rate that such knowledge is acquired for the fission yeast (Schizosaccharomyces pombe) and budding yeast (Saccharomyces cerevisiae). This trend was not entirely unexpected because some processes have profound effects on cell proliferation, and the genes required for these processes were almost certain to be studied sooner. However, at current rates, it will be many more decades before we know the role of every protein, even in the well-studied unicellular model species.
We found that 20% of proteins in both of the model yeasts, and in humans had no informative physiological process assignment. Surprisingly, a quarter of the fission yeast’s unknown proteins are conserved outside of these fungi (either in humans and other multicellular eukaryotes, or in prokaryotes); they have therefore been conserved for at least 400 million years. This wide taxonomic conservation implies these proteins are doing something really quite important.
We wondered if the most recently characterised proteins in the time-line were associated with any specific cellular processes and found that, for fission yeast, many were involved in detoxification and cellular homeostasis. This might provide a clue to why these proteins remained below the radar for such a long time. Accumulation of toxicity may only become noticeable over longer time-scales, or in specific conditions. These are all types of processes associated with ageing.
“We aren’t really sure why these genes remain unstudied, either nobody has yet looked in detail or we aren’t asking the right questions.”
We aren’t really sure why these genes remain unstudied, either nobody has yet looked in detail or we aren’t asking the right questions. Funders are currently risk-averse and it is difficult to get grant support for something in which you are not already an expert. Discovery-driven data sets provide clues, but don’t answer the question “what does this gene do”? There may be a disconnect where strong leads generated from functional genomics data are not reaching bench biologists in the appropriate domain of interest. It is even possible that basic research laboratories are already intensively trying to crack these puzzles, but not finding the right tools or conditions for many. Resolving the “unknown problem” requires some investigation into the possible contributory factors coupled with systematic in-depth studies.
Your review received lots of attention on BioRxiv; as a researcher, how have you found preprint servers helpful?
This is the first time we used the preprint server. We thought it was a good idea and increasingly is being used by the fission yeast community, and especially by high profile groups. We re-tweet many of the pre-print announcements on the PomBase twitter feed and it seems to generate a lot of interest before publication.
Why did you submit to Open Biology and how was your experience publishing with Royal Society Publishing?
We submitted to Open Biology as it is open access, and seemed a good fit for this work, which is clearly focussed on a big problem at the molecular and cellular level. I was also one of the joint first authors on the analysis of the fission yeast deletion collection to identify cell size and shape phenotypes published in Open Biology. The review process was very smooth on both occasions. We were keen to get this paper published in a timely way to include it in our upcoming grant proposal.
Open Biology is looking to publish more high quality research articles in cellular and molecular biology. Find out more about the benefits of publishing with the journal.
PomBase image credit: Wikicommons
Figure: Taxonomic conservation and features of unknown proteins. https://royalsocietypublishing.org/doi/10.1098/rsob.180241
Portrait: Valerie Wood at the University of Cambridge. © University of Cambridge.