Enhanced function annotations for Drosophila serine proteases: a case study for systematic annotation of multi-member gene families
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Enhanced function annotations for Drosophila serine proteases : a case study for systematic annotation of multi-member gene families. / Shah, Parantu K; Tripathi, Lokesh P; Jensen, Lars Juhl; Gahnim, Murad; Mason, Christopher; Furlong, Eileen E; Rodrigues, Veronica; White, Kevin P; Bork, Peer; Sowdhamini, R.
In: Gene, Vol. 407, No. 1-2, 2008, p. 199-215.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Enhanced function annotations for Drosophila serine proteases
T2 - a case study for systematic annotation of multi-member gene families
AU - Shah, Parantu K
AU - Tripathi, Lokesh P
AU - Jensen, Lars Juhl
AU - Gahnim, Murad
AU - Mason, Christopher
AU - Furlong, Eileen E
AU - Rodrigues, Veronica
AU - White, Kevin P
AU - Bork, Peer
AU - Sowdhamini, R
PY - 2008
Y1 - 2008
N2 - Systematically annotating function of enzymes that belong to large protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190 Drosophila gene products containing serine-protease-like domains, of which 35 are SPHs. This was accomplished by analysis of structure-function relationships, gene-expression profiles, large-scale protein-protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on Drosophila proteases to assign putative competitive inhibitor relationships (CIRs). Microarray gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in immune response. We found specific recruitment of SPHs and proteases with CLIP domains in immune response, suggesting evolution of a new function for SPHs. We also suggest existence of separate downstream protease cascades for immune response against bacterial/fungal infections and parasite/parasitoid infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes.
AB - Systematically annotating function of enzymes that belong to large protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190 Drosophila gene products containing serine-protease-like domains, of which 35 are SPHs. This was accomplished by analysis of structure-function relationships, gene-expression profiles, large-scale protein-protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on Drosophila proteases to assign putative competitive inhibitor relationships (CIRs). Microarray gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in immune response. We found specific recruitment of SPHs and proteases with CLIP domains in immune response, suggesting evolution of a new function for SPHs. We also suggest existence of separate downstream protease cascades for immune response against bacterial/fungal infections and parasite/parasitoid infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes.
U2 - 10.1016/j.gene.2007.10.012
DO - 10.1016/j.gene.2007.10.012
M3 - Journal article
C2 - 17996400
VL - 407
SP - 199
EP - 215
JO - Gene
JF - Gene
SN - 0378-1119
IS - 1-2
ER -
ID: 40740165