British researchers have developed a new publicly accessible database that they hope to see shrink over time. That's because the database brings together thousands of understudied proteins encoded by genes in the human genome, whose existence is known but whose functions are mostly unknown.

The database, called "unknome," is the result of research by Matthew Freeman of the Dunn School of Pathology at the University of Oxford, UK, and Sean Munro of the MRC Laboratory of Molecular Biology in Cambridge, UK, and their colleagues. They studied some of the proteins in the database and found that most contribute to important cellular functions, including development and stress resistance.

Sequencing of the human genome has clearly shown that the human genome encodes thousands of possible protein sequences whose identities and functions have remained unknown to this day. The reasons for this are multifactorial, including a tendency to focus scarce research funding on known targets and a lack of tools, including antibodies, to study the function of these proteins in cells.

But the authors believe that ignoring these proteins is risky because it is likely that some proteins, perhaps many, play important roles in key cellular processes and could both provide insight and serve as targets for therapeutic intervention.

To facilitate more rapid exploration of this class of proteins, the authors created the Unknome database, which assigns each protein a "knownness" score that reflects information in the scientific literature regarding function, cross-species conservation, subcellular compartmentalization, and other elements.

According to this system, there are thousands of proteins with a "known degree" close to zero. These include proteins from model organisms, as well as proteins from the human genome. The database is open to everyone and customizable, allowing users to provide their own weightings for different elements and thus generate their own set of knownness scores to prioritize their own research.

To test the usefulness of the database, the authors selected 260 genes in humans that have similar genes in flies and have a knownness score of 1 or less in both species, indicating that almost nothing is known about them. Complete knockout of many of these genes is incompatible with fly life; partial or tissue-specific knockout has revealed that most genes contribute to important functions affecting fertility, development, tissue growth, protein quality control, or stress resistance.

The findings show that despite decades of detailed research, thousands of fly genes remain to be understood even at the most basic level, and the same is clearly the case with the human genome. "These uncharacterized genes should not be ignored," Munro said. "Our database provides a powerful, versatile and efficient platform for identifying and selecting important genes of unknown function for analysis, thereby accelerating the closing of the biological knowledge gap represented by unknown genomes." "

Munro added: "The roles of thousands of human proteins remain unclear, but research tends to focus on those that are already well understood. To help solve this problem, we created an 'Unknome' database, which ranks proteins according to how well they are known, and then functionally screens a subset of these mysterious proteins to show how ignorance drives biological discovery."