Quantitative assessment of protein function prediction from metagenomics shotgun sequences

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Quantitative assessment of protein function prediction from metagenomics shotgun sequences. / Harrington, E D; Singh, Arjun; Doerks, T; Letunic, I; von Mering, C; Jensen, L J; Raes, J; Bork, P.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 104, No. 35, 2007, p. 13913-8.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Harrington, ED, Singh, A, Doerks, T, Letunic, I, von Mering, C, Jensen, LJ, Raes, J & Bork, P 2007, 'Quantitative assessment of protein function prediction from metagenomics shotgun sequences', Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 35, pp. 13913-8. https://doi.org/10.1073/pnas.0702636104

APA

Harrington, E. D., Singh, A., Doerks, T., Letunic, I., von Mering, C., Jensen, L. J., Raes, J., & Bork, P. (2007). Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proceedings of the National Academy of Sciences of the United States of America, 104(35), 13913-8. https://doi.org/10.1073/pnas.0702636104

Vancouver

Harrington ED, Singh A, Doerks T, Letunic I, von Mering C, Jensen LJ et al. Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(35):13913-8. https://doi.org/10.1073/pnas.0702636104

Author

Harrington, E D ; Singh, Arjun ; Doerks, T ; Letunic, I ; von Mering, C ; Jensen, L J ; Raes, J ; Bork, P. / Quantitative assessment of protein function prediction from metagenomics shotgun sequences. In: Proceedings of the National Academy of Sciences of the United States of America. 2007 ; Vol. 104, No. 35. pp. 13913-8.

Bibtex

@article{19547e35ba714b80889c125942e99842,
title = "Quantitative assessment of protein function prediction from metagenomics shotgun sequences",
abstract = "To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.",
author = "Harrington, {E D} and Arjun Singh and T Doerks and I Letunic and {von Mering}, C and Jensen, {L J} and J Raes and P Bork",
year = "2007",
doi = "10.1073/pnas.0702636104",
language = "English",
volume = "104",
pages = "13913--8",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
publisher = "The National Academy of Sciences of the United States of America",
number = "35",

}

RIS

TY - JOUR

T1 - Quantitative assessment of protein function prediction from metagenomics shotgun sequences

AU - Harrington, E D

AU - Singh, Arjun

AU - Doerks, T

AU - Letunic, I

AU - von Mering, C

AU - Jensen, L J

AU - Raes, J

AU - Bork, P

PY - 2007

Y1 - 2007

N2 - To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.

AB - To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.

U2 - 10.1073/pnas.0702636104

DO - 10.1073/pnas.0702636104

M3 - Journal article

C2 - 17717083

VL - 104

SP - 13913

EP - 13918

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 35

ER -

ID: 40749039