is Kr dellin form nline M.Y. Abstract viral integrase), abasic DNA repair (exonuclease III), transposition (mu transposase), recombination and ribonuclease H-like superfamily is defined by a com- ously uncharacterized PfamA [12] domain of un- known function DUF458 (Pfam Accession No. PF04308, InterPro Accession No. IPR007405) is yet another family of RNase H-like proteins and iden- tify its potential active site residues. ieties. Published by Elsevier B.V. All rights reserved. * Corresponding author. Tel.: +48 22 8749100; fax: +48 22 8749130. E-mail address:
[email protected] (K. Ginalski). FEMS Microbiology Letters 25 0378-1097/$22.00 � 2005 Federation of European Microbiological Soc Ribonuclease H-like (RNase H-like) proteins com- prise an important class of nucleic acid modifying enzymes with nuclease or polynucleotidyl transferase activities. The SCOP database [1] currently defines seven different families in RNase H-like superfamily including RNase H [2,3], retroviral integrase [4], mu transposase [5], Tn5 transposase [6], DnaQ-like 3 0–5 0 exonuclease [7], RuvC resolvase [8] and mito- chondrial resolvase ydc2 [9]. These enzymes perform critical functions in various biological processes such as transcription (RNase H) or viral infection (retro- mon core fold that includes a central five-stranded, mixed b-sheet flanked by a-helices on both sides (with bbbababa topology) [10]. Various reactions catalyzed by the superfamily members share a com- mon metal ion dependent catalytic mechanism sup- ported by the presence of an invariant active site DDE motif [11]. Two highly conserved aspartates are located on spatially adjacent first and fourth b-strands of the core RNase H-like fold, while the position of third active site carboxylate varies depending on the family. Here we show that previ- In addition to one hypothetical viral sequence from Bacteriophage KVP40, the PfamA family of unknown function DUF458 (Pfam Accession No. PF04308) encompasses several uncharacterized bacterial proteins including Bacillus subtilis YkuK protein. Using Meta-BASIC, a highly sensitive method for detection of distant similarity between proteins, we assign DUF458 family mem- bers to the ribonuclease H-like (RNase H-like) superfamily. DUF458 sequences maintain all core secondary structure elements of RNase H-like fold and share several conserved, presumably active site residues with RNase HI, including an invariant DDE motif. In addition to providing a model structure for a previously uncharacterized protein family, this finding suggests that DUF458 pro- teins function as nucleases. The unusual phyletic pattern, together with a presence of DUF458 in several thermophilic organisms, may suggest a potential role of these proteins in DNA repair in stressful conditions such as an extreme heat or other stress that causes spore formation. � 2005 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved. Keywords: YkuK; DUF458 family; RNase H-like fold; Structure/function prediction; Active site; Nuclease 1. Introduction recombinatorial DNA repair (RuvC resolvase). The Bacillus subtilis YkuK protein Łukasz Kni _zewski, Interdisciplinary Centre for Mathematical and Computational Mo Received 19 May 2005; received in revised First published o Edited by doi:10.1016/j.femsle.2005.08.020 distantly related to RNase H zysztof Ginalski * g, Warsaw University, Pawin´skiego 5A, 02-106 Warsaw, Poland 12 August 2005; accepted 16 August 2005 30 August 2005 Galperin www.fems-microbiology.org 1 (2005) 341–346 use of sequence profiles and secondary structure pre- dictions (meta-profiles) for query sequence and given taken from the E. coli Tn5 transposase structure 342 Ł. Kni _zewski, K. Ginalski / FEMS Microb protein families with various scoring systems and meta-profile alignment algorithms to detect distant similarity between proteins even if the structure of the reference protein is not known. Specifically, the consensus sequence of DUF458 was compared to all 7,418 PfamA families and to 10,128 proteins (representatives at 90% of sequence identity) ex- tracted from Protein Data Bank (PDB) [24]. The same comparison was also conducted using PSI- BLAST and RPS-BLAST [13]. Finally, both the consensus sequence of DUF458 and one of the family members, Bacillus subtilis YkuK pro- tein, were submitted to Meta-Server [25] (http://bio- info.pl/meta) that assembles various secondary structure prediction and top-of-the-line fold recognition methods. Collected predictions were screened with 3D- Jury [26], the consensus method of fold recognition 2. Materials and methods 2.1. Identification of DUF458 family members To identify all DUF458 family members an exhaus- tive, transitive PSI-BLAST [13] search procedure was applied. Initially, PSI-BLAST search against the NCBI non-redundant protein sequence database (filtered nr, posted 8 March 2005; 2,354,365 sequences) with inclu- sion threshold of 0.01 until profile convergence was car- ried out using the consensus sequence of PfamA [12] DUF458 as a query. Consequently, collected sequences were subjected to further PSI-BLAST searches until no new sequences were found. DUF458 family members were also subjected to neighborhood analysis by the STRING database [14] (http://string.embl.de) to detect possible functional associations. 2.2. Structural assignment for DUF458 proteins Initially, the consensus sequence of DUF458 as well as several members of this family were analyzed with CDD [15] (http://www.ncbi.nlm.nih.gov/Struct/cdd/ wrpsb.cgi) and SMART [16] (http://www.smart.embl- heidelberg.de) search tools to detect conserved protein domains annotated in SMART, Pfam and COG [17] databases. This analysis also included searches for transmembrane segments (with TMHMM2 [18]), sig- nal peptides (SignalP [19]), low compositional com- plexity (CEG [20]) and coiled coil (Coils2 [21]) regions, as well as regions containing internal repeats (Prospero [22]). Further searches were performed with meta-profile alignment method Meta-BASIC [23] available at http://basic.bioinfo.pl. Meta-BASIC combines the servers. (PDBj1b7e) [6], which has similar secondary structure pattern in this region without any inserted additional elements. Finally, the overall quality of the modeled structure was checked with Verify3D program [31]. 3. Results and discussion 3.1. DUF458 belongs to RNase H-like superfamily Initial PSI-BLAST [13] search with DUF458 consen- sus sequence performed against the NCBI non-redun- dant protein sequence database (E-value threshold of 0.01) identified 16 entirely uncharacterized proteins (pre- dominantly from bacterial species), including sequences belonging to an uncharacterized cluster of orthologues [32] COG1978. Further exhaustive PSI-BLAST searches originated from these DUF458 family sequences did not detect any other proteins with significant E-value (below 0.01). Application of other standard sequence similarity search methods such as CDD [15] and SMART [16] database search tools using collected DUF458 sequences did not yield any significant hits to known protein do- mains. In addition, the 3D-Jury [26] coupled to structure prediction Meta-Server [25], also did not provide any consistent and reliable matches (with scores above 50 [33]) to existing structures. However, using meta-profile alignment method Meta-BASIC [23] DUF458 consensus sequence was mapped to both Piwi domain (PF02171 in Pfam database, Z-score 13.37), recently shown to pos- 2.3. Generation of sequence-to-structure alignment To define the general conservation pattern in the se- lected template, Escherichia coli RNase HI (PDBj2rn2) [2], its close homologues were collected with PSI- BLAST search against the NCBI non-redundant protein sequence database. Multiple sequence alignments for both DUF458 and RNase HI families were prepared using PCMA program [27] followed by final manual adjustments. Sequence-to-structure alignment between DUF458 and RNase HI families was built manually using 3D assessment procedure [28] taking into account predicted secondary structure (consensus of the results of several secondary structure prediction methods, mainly PSI-PRED [29]), hydrophobic profile of the fam- ily and conservation of presumably catalytic residues. 2.4. 3D model building Based on the final sequence-to-structure alignment, a 3D model of B. subtilis YkuK protein was built with the MODELLER program [30] using the E. coli RNase HI structure (PDBj2rn2) as a template. The relative orienta- tion of the second a-helix of the conserved fold core was iology Letters 251 (2005) 341–346 sess RNase H-like structure [34], and a holliday junc- tion-specific endonuclease [8] from E. coli (RuvC resol- vase, PDBj1hjr, Z-score 12.05). Importantly, these inter- esting Meta-BASIC hits were assigned quite confident scores (predictions with Z-score above 12 have less than 5% probability of being incorrect [23]), and both belong to RNase H-like fold superfamily. Further sequence/ structure analysis of RNase H-like superfamily repre- sentatives revealed that DUF458 proteins share similar pattern of presumably active site residues with RNase HI, which structure (PDBj2rn2) [2] was selected as pri- mary template for the modeling. Absolute conservation of critical active site residues and good mapping of hydrophobic profiles of the families as well as predicted and observed essential secondary structure elements of RNase H-like fold are additional indicators of the cor- rect but highly non-trivial structural assignment (Fig. 1). 3.2. Identification of potential active site residues and functional prediction All DUF458 proteins encompass only a single RNase of the DUF458 invariant glutamate (Glu 89 in B. subtilis YkuK protein) in the first a-helix of the conserved fold core is identical to that of RNase HI (Glu 48 in PDBj2rn2 [2]) (Figs. 1 and 2). By analogy to several well characterized RNase H-like proteins [11,35], we predict that these conserved DUF458 residues of the DDE mo- tif are involved in metal ion coordination important for stabilization of the transition state intermediate and possibly activation of a water molecule that serves as the attacking nucleophile in hydrolysis. Although DUF458 proteins lack the RNase HI-like basic protru- sion responsible for binding DNA/RNA hybrid sub- strate [36], they retain another highly conserved active site residue, Lys 157 (numbering from B. subtilis YkuK protein), at the end of the predicted fifth b-strand (Figs. 1and 2). This residue, also specific to RNase HI (Lys 122 in PDBj2rn2), is supposed to interact with nucleic acid. A somewhat unique feature of DUF458 family is the presence of a conserved positively charged residue (Lys 85 in B. subtilis YkuK protein) at the beginning of the predicted first a-helix of the RNase H-like fold core, ase H and a henifo hafn mbio uence follow me: u ons o mark Ł. Kni_zewski, K. Ginalski / FEMS Microbiology Letters 251 (2005) 341–346 343 H-like domain with predicted abbbababa topology, where first a-helix probably packs on the common core of the fold (Fig. 2a). Importantly, among different RNase H-like superfamily members of known structure, DUF458 family displays similar conservation of several presumably active site residues to RNase HI [2], which degrades RNA moieties in DNA/RNA hybrid via its endonuclease activity. As shown in Fig. 1, DUF458 pro- teins conserve a catalytic DDE motif characteristic of RNase H-like enzymes, with two aspartates (Asp 43 and Asp 129 in B. subtilis YkuK protein) situated on the predicted first and fourth b-strands. The location Fig. 1. Multiple sequence alignment for DUF458 proteins (top) and RN the gene name, the NCBI gene identification (gi) number or PDB code anthracis; Bc, Bacillus cereus; Bk, Bacteriophage KVP40; Bl, Bacillus lic acetobutylicum; Ct, Clostridium thermocellum; Dh, Desulfitobacterium thermoacetica; Rm, Ralstonia metallidurans; Ss, Silicibacter sp.; St, Sy robacter tengcongensis. Gene name and gi number for the only viral seq are indicated before and after each sequence with total sequence lengths in parentheses. Residue conservation is denoted with the following sche small, red. Invariant active site residues are highlighted in black. Locati PDBj2rn2) secondary structure elements (E, b-strand; H, a-helix) are shown. which may be involved in binding nucleic acid substrate (Fig. 2(a)). Our findings indicate that DUF458 se- quences, which comprise a novel family in the RNase H-like superfamily, may possess nuclease activity, although their exact role, specific nucleic acid substrate (DNA/RNA hybrid, single or double stranded DNA or RNA) and detailed mechanism of action remain to be determined through further biochemical experiments. Further analysis of domain architecture and genomic context did not help identify potential physiological roles for the family. Firstly, DUF458 sequences do not possess additional fused domains that may hint at I family representatives (bottom). Sequences are labeled according to n abbreviation of the species name: Aa, Aquifex aeolicus; Ba, Bacillus rmis; Bp, Bordetella parapertussis; Bs, Bacillus subtilis; Ca, Clostridium iense; Dv, Desulfovibrio vulgaris; Ec, Escherichia coli; Mt, Moorella bacterium thermophilum; Tm, Thermotoga maritima; Tt, Thermoanae- in DUF458 family are colored red. The first and last residue numbers ing in square brackets. The numbers of excluded residues are specified ncharged, highlighted in yellow; charged or polar, highlighted in gray; f predicted (B. subtilis YkuK protein) and observed (E. coli RNase HI, ed above the sequences. N-terminal a-helix of DUF458 family is not re el hich othe Ver I str sion 344 Ł. Kni _zewski, K. Ginalski / FEMS Microbiology Letters 251 (2005) 341–346 biological function. Secondly, although neighborhood analysis for DUF458 (COG1978) using STRING data- base [14] provides links to several other protein families (all with coexpression evidence with medium confidence STRING scores), including periplasmic component of ABC-type metal ion transport system (COG1464), folyl- polyglutamate synthase (COG0285) and non-supervised Fig. 2. (a) 3D model of Bacillus subtilis YkuK protein for the essential co and Tn5 transposase (PDBj1b7e) structures. The N-terminal a-helix, w chains of an invariant catalytic DDE motif are shown together with two interact with nucleic acid substrate. The 3D–1D profile obtained with substantial errors as seen for the incorrect structures. (b) E. coli RNase H residues of RNase HI conserved in DUF458 proteins. The basic protru orthologous group (NOG47279), results of this analysis are rather inconclusive and detected coexpression hardly concludes any functional association. With the exception of one viral sequence from Bacte- riophage KVP40, DUF458 proteins were identified en- tirely in bacterial organisms. Bacterial members of the family can be found mainly in Firmicutes, including Ba- cilli (B. anthracis, B. cereus, B. licheniformis, B. subtilis), Clostridia (Clostridium acetobutylicum, C. thermocellum, Desulfitobacterium hafniense, Moorella thermoacetica, Thermoanaerobacter tengcongensis) and Symbiobacte- rium (Symbiobacterium thermophilum), with single exam- ples in two additional termophilic phyla: Aquificae (Aquifex aeolicus) and Thermotogae (Thermotoga mari- tima). Considering this taxonomic distribution, the DUF458 family most likely evolved specifically in Bacil- lus/Clostridia species with subsequent transfer (taking into account presence in a phage) to a few other thermo- philic species (Aquifex aeolicus and Thermotoga mari- tima). In this case the function of DUF458 is probably specific to Bacillus/Clostridia environment, but also al- lows some advantage for thermophiles. The unusual phyletic pattern, together with a presence of DUF458 in several thermophilic organisms, may suggest a poten- tial role of these proteins in DNA repair in stressful con- ditions such as an extreme heat or other stress that causes spore formation. The B. subtilis genome is known to contain three dif- ferent genes encoding E. coli RNases HI (ypdQ) and HII (rnh and ysgB) homologues [37]. While both B. subtilis RNases HII homologues exhibit RNase H activity, a ypdQ protein displays no detectable RnaseH activity ements of the RNase H-like fold based on E. coli RNase HI (PDBj2rn2) possibly packs on the RNase H-like fold core, is not shown. The side r highly conserved active site residues: Lys 85 and Lys 157, supposed to ify3D program indicate that modeled structure do not contains any ucture (PDBj2rn2). Displayed are the side chains of invariant active site responsible for binding DNA/RNA hybrid is denoted in white. either in vivo or in vitro. In contrast, a close ypdQ homologue (Vng0255c protein from Halobacterium sp. NRC-1) was shown to have RNase H activity both in vivo and in vitro [38]. Results of our 3D modeling performed with Meta-Server (data not shown) indicate that both ypdQ and Vng0255c proteins lack basic pro- trusion a-helix present in E. coli RNases HI and have similar structure in this region to a closely related RNase H domain of HIV-1 reverse transcriptase (RT), which was shown to possess no detectable RNase H activity on its own [39]. Among RNase H domains of HIV-1 RT, ypdQ and Vng0255c proteins, only Vng0255c pro- tein displays a patch of several positively charged resi- dues in the region adjacent to the deletion of the basic protrusion a-helix. This feature may substitute for the basic protrusion in binding hybrid DNA/RNA sub- strate, explaining the observed RNase H activity of the Vng0255 gene product. Using PSI-BLAST search against bacterial proteins from the NCBI non-redun- dant sequence database we have found that in addition to ypdQ protein, B. subtilis possesses another homo- logue of E. coli RNases HI, encoded by ypeP gene (4th iteration, E-value 1e � 04). Although hypothetical ypeP protein lacks the two first b-strands of the con- served fold core, these critical elements are clearly en- Mol. Biol. 307, 541–556. [4] Chen, Z., Yan, Y., Munshi, S., Li, Y., Zugay-Murphy, J., Xu, B., Witmer, M., Felock, P., Wolfe, A., Sardana, V., Emini, E.A., Mu transposase core: a common structural motif for DNA transposition and retroviral integration. Cell 82, 209–220. Ł. Kni_zewski, K. Ginalski / FEMS Microbiology Letters 251 (2005) 341–346 345 [6] Davies, D.R., Braam, L.M., Reznikoff, W.S. and Rayment, I. (1999) The three-dimensional structure of a Tn5 transposase- related protein determined to 2.9-A resolution. J. Biol. Chem. 274, 11904–11913. [7] Mol, C.D., Kuo, C.F., Thayer, M.M., Cunningham, R.P. and Tainer, J.A. (1995) Structure and function of the multifunctional DNA-repair enzyme exonuclease III. Nature 374, 381–386. [8] Ariyoshi, M., Vassylyev, D.G., Iwasaki, H., Nakamura, H., Shinagawa, H. and Morikawa, K. (1994) Atomic structure of the RuvC resolvase: a holliday junction-specific endonuclease from E. coli. Cell 78, 1063–1072. [9] Ceschini, S., Keeley, A., McAlister, M.S., Oram, M., Phelan, J., Pearl, L.H., Tsaneva, I.R. and Barrett, T.E. (2001) Crystal structure of the fission yeast mitochondrial Holliday junction resolvase Ydc2. EMBO J. 20, 6601–6611. [10] Yang, W. and Steitz, T.A. (1995) Recombining the structures of Hazuda, D. and Kuo, L.C. (2000) X-ray structure of simian immunodeficiency virus integrase containing the core and C- terminal domain (residues 50–293) – an initial glance of the viral DNA binding platform. J. Mol. Biol. 296, 521–533. [5] Rice, P. and Mizuuchi, K. (1995) Structure of the bacteriophage coded downstream of ypeP gene, indicating that the gene boundaries were assigned incorrectly. Importantly, the ypeP protein also lacks the basic protrusion a-helix (while having similar local structure to RNase H domain of HIV-1 RT in this region) as well as a patch of posi- tively charged residues in surrounding sequence region. In this light, an intriguing hypothesis that ypeP or YkuK proteins may represent currently lacking RNase HI in B. subtilis seems to be rather unlikely, unless these proteins utilize additional protein domains for substrate recognition as it is observed for RNase H domain of HIV-1 RT. In any case, further experimental investiga- tions should address this issue. Acknowledgements We thank Dr. Lisa Kinch for critical reading of the manuscript and helpful discussions. References [1] Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536– 540. [2] Katayanagi, K., Miyagawa, M., Matsushima, M., Ishikawa, M., Kanaya, S., Nakamura, H., Ikehara, M., Matsuzaki, T. and Morikawa, K. (1992) Structural details of ribonuclease H from Escherichia coli as refined to an atomic resolution. J. Mol. Biol. 223, 1029–1052. [3] Chapados, B.R., Chai, Q., Hosfield, D.J., Qiu, J., Shen, B. and Tainer, J.A. (2001) Structural biochemistry of a type 2 RNase H: RNA primer recognition and removal during DNA replication. J. HIV integrase, RuvC and RNase H. Structure 3, 131–134. [11] Venclovas, C. and Siksnys, V. (1995) Different enzymes with similar structures involved in Mg(2+)-mediated polynucleotidyl transfer. Nat. Struct. Biol. 2, 838–841. [12] Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C. and Eddy, S.R. (2004) The Pfam protein families database. Nucl. Acids Res. 32, D138–D141. [13] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI- BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402. [14] von Mering, C., Jensen, L.J., Snel, B., Hooper, S.D., Krupp, M., Foglierini, M., Jouffre, N., Huynen, M.A. and Bork, P. (2005) STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucl. Acids Res. 33 (Database Issue), D433–D437. [15] Marchler-Bauer, A., Anderson, J.B., Cherukuri, P.F., DeWeese- Scott, C., Geer, L.Y., Gwadz, M., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Marchler, G.H., Mullokandov, M., Shoemaker, B.A., Simonyan, V., Song, J.S., Thiessen, P.A., Yamashita, R.A., Yin, J.J., Zhang, D. and Bryant, S.H. (2005) CDD: a Conserved Domain Database for protein classification. Nucl. Acids Res. 33 (Database Issue), D192–D196. [16] Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P. and Bork, P. (2004) SMART 4.0: towards genomic data integration. Nucl. Acids Res. 32 (Database issue), D142–D144. [17] Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D. and Koonin, E.V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucl. Acids Res. 29, 22–28. [18] Sonnhammer, E.L., von Heijne, G. and Krogh, A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 175– 182. [19] Nielsen, H., Engelbrecht, J., Brunak, S. and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6. [20] Wootton, J.C. (1994) Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285. [21] Lupas, A., Van Dyke, M. and Stock, J. (1991) Predicting coiled coils from protein sequences. Science 252, 1162–1164. [22] Mott, R. (2000) Accurate formula for P-values of gapped local sequence and profile alignments. J. Mol. Biol. 300, 649– 659. [23] Ginalski, K., von Grotthuss, M., Grishin, N.V. and Rychlewski, L. (2004) Detecting distant homology with Meta-BASIC. Nucl. Acids Res. 32, W576–W581. [24] Berman, H.M., Bhat, T.N., Bourne, P.E., Feng, Z., Gilliland, G., Weissig, H. and Westbrook, J. (2000) The protein data bank and the challenge of structural genomics. Nat. Struct. Biol. 7 (Suppl), 957–959. [25] Bujnicki, J.M., Elofsson, A., Fischer, D. and Rychlewski, L. (2001) Structure prediction meta server. Bioinformatics 17, 750– 751. [26] Ginalski, K., Elofsson, A., Fischer, D. and Rychlewski, L. (2003) 3D-Jury: a simple approach to improve protein structure predic- tions. Bioinformatics 19, 1015–1018. [27] Pei, J., Sadreyev, R. and Grishin, N.V. (2003) PCMA: fast and accurate multiple sequence alignment based on profile consis- tency. Bioinformatics 19, 427–428. [28] Ginalski, K. and Rychlewski, L. (2003) Protein structure predic- tion of CASP5 comparative modeling and fold recognition targets using consensus alignment approach and 3D assessment. Proteins 53 (Suppl. 6), 410–417. [29] Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202. [30] Sali, A. and Blundell, T.L. (1993) Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815. [31] Luthy, R., Bowie, J.U. and Eisenberg, D. (1992) Assessment of protein models with three-dimensional profiles. Nature 356, 83– 85. [32] Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J. and Natale, D.A. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41. [33] Ginalski, K. and Rychlewski, L. (2003) Detection of reliable and unexpected protein fold predictions using 3D-Jury. Nucl. Acids Res. 31, 3291–3292. [34] Song, J.J., Smith, S.K., Hannon, G.J. and Joshua-Tor, L. (2004) Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305, 1434–1437. [35] Keck, J.L., Goedken, E.R. and Marqusee, S. (1998) Activation/ attenuation model for RNase H. A one-metal mechanism with second-metal inhibition. J. Biol. Chem. 273, 34128–34133. [36] Kanaya, S., Katsuda-Nakai, C. and Ikehara, M. (1991) Impor- tance of the positive charge cluster in Escherichia coli ribonuclease HI for the effective binding of the substrate. J. Biol. Chem. 266, 11621–11627. [37] Ohtani, N., Haruki, M., Morikawa, M., Crouch, R.J., Itaya, M. and Kanaya, S. (1999) Identification of the genes encoding Mn2+- dependent RNase HII and Mg2+-dependent RNase HIII from Bacillus subtilis: classification of RNases H into three families. Biochemistry 38, 605–618. [38] Ohtani, N., Yanagawa, H., Tomita, M. and Itaya, M. (2004) Identification of the first archaeal Type 1 RNase H gene from Halobacterium sp. NRC-1: archaeal RNase HI can cleave an RNA–DNA junction. Biochem. J. 381, 795–802. [39] Hostomsky, Z., Hostomska, Z., Hudson, G.O., Moomaw, E.W. and Nodes, B.R. (1991) Reconstitution in vitro of RNase H activity by using purified N-terminal and C-terminal domains of human immunodeficiency virus type 1 reverse transcriptase. Proc. Natl. Acad. Sci. USA 88, 1148–1152. 346 Ł. Kni _zewski, K. Ginalski / FEMS Microbiology Letters 251 (2005) 341–346 Bacillus subtilis YkuK protein is distantly related to RNase H Introduction Materials and methods Identification of DUF458 family members Structural assignment for DUF458 proteins Generation of sequence-to-structure alignment 3D model building Results and discussion DUF458 belongs to RNase H-like superfamily Identification of potential active site residues and functional prediction Acknowledgements References