Quantitative assessment of protein function prediction from metagenomics shotgun sequences
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Quantitative assessment of protein function prediction from metagenomics shotgun sequences. / Harrington, E D; Singh, Arjun; Doerks, T; Letunic, I; von Mering, C; Jensen, L J; Raes, J; Bork, P.
In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 104, No. 35, 2007, p. 13913-8.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Quantitative assessment of protein function prediction from metagenomics shotgun sequences
AU - Harrington, E D
AU - Singh, Arjun
AU - Doerks, T
AU - Letunic, I
AU - von Mering, C
AU - Jensen, L J
AU - Raes, J
AU - Bork, P
PY - 2007
Y1 - 2007
N2 - To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.
AB - To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.
U2 - 10.1073/pnas.0702636104
DO - 10.1073/pnas.0702636104
M3 - Journal article
C2 - 17717083
VL - 104
SP - 13913
EP - 13918
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
SN - 0027-8424
IS - 35
ER -
ID: 40749039