Web-based High-Throughput Tool For Next-Generation Sequence Annotation

3m ago

2 Views

1 Downloads

710.08 KB

9 Pages

Last View : 3m ago

Last Download : 3m ago

Upload by : Isobel Thacker

Report this link

Download PDF

Transcription

2011 DoD High Performance Computing Modernization Program Users Group Conference A Web-based High-Throughput Tool for Next-Generation Sequence Annotation Kamal Kumar, Valmik Desai, Li Cheng, Maxim Khitrov, Deepak Grover, Ravi Vijaya Satya, Chenggang Yu, Nela Zavaljevski, and Jaques Reifman Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Fort Detrick, MD {kamal, valmik, lcheng, mkhitrov, dgrover, rvijaya, cyu, nelaz, reifman}@bioanalysis.org Abstract The availability of a large number of genome sequences, resulting from inexpensive, high-throughput next-generation sequencing platforms, has created the need for an integrated, fully-automated, rapid, and high-throughput annotation capability that is also easy-to-use. Here, we present a web-based software application, Annotation of Genome Sequences (AGeS), which incorporates publicly-available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. The current version of AGeS provides annotations for bacterial genome sequences, and serves as a readily-accessible resource to Department of Defense (DoD) scientists for storing, annotating, and visualizing genomes of newly-sequenced pathogens of interest. The AGeS system is composed of two major components. The first component is a web-based application that provides a graphical user interface for managing users’ input genomes, submitting annotation jobs, and visualizing results. Sequence contigs are uploaded as a multi-FASTA input file and submitted for annotation, and the resulting annotations are visualized through GBrowse. The input genome sequences and the annotation results are stored in a secure, customized database. The second component is a high-throughput annotation pipeline for finding the genomic regions that code for proteins, RNAs, and other genomic elements through a Do-It-Yourself Annotation framework. The pipeline also functionally annotates the protein-coding regions using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation. The annotation pipeline has been deployed on the Mana Linux cluster at the Maui High Performance Computing Center. The two components are connected together using the DoD user interface toolkit application programming interface. The AGeS system was evaluated for scaling of its parallel execution and annotation performance. AGeS scaled with super-linear speedup for up to 128 processors, after which performance degraded. A 2.2-Mbp bacterial genome sequence can be annotated in 1 hr using 128 processors. AGeS annotations of draft and complete genomes were compared with the original annotations from three different sources, and were found to be in general agreement with them. 1. Introduction Access to inexpensive, high-throughput DNA sequencing technology has led to an explosion in the number of sequenced organisms and the volume of sequenced data[1]. To date, due to the so called “next-generation sequencing” technology, the genomes of 1,000 microbial pathogens and their near neighbors are available, and many more are being sequenced. A genome sequence provides valuable information in terms of genomic features, such as genes that code for proteins and RNAs, as well as the positions and numbers of tandem repeats. In addition, we can gain further insights by annotating the functions of the proteins that the genes code for. This valuable information, gleaned from the annotation of a newly sequenced complete genome, can help devise new strategies in diagnostics and forensics. Moreover, these annotations, coupled with comparative genomics, can enable novel approaches to identify vaccine candidates and potentially discover “universal” drug targets. For such downstream applications, the annotation of genomic sequences needs to be integrated, fully-automated, rapid, and high-throughput; and for such annotation capability to be truly effective, it should also be easyto-use and readily available. To address this need, we developed the Annotation of Genome Sequences (AGeS) software system, which was designed as a modular and flexible platform to facilitate the annotation, storage, and comparative analysis of sequenced genomes[2]. The AGeS system is composed of a Web-based application and a software pipeline. The Web-based application enables users to upload and store input contig sequences and the resulting annotation data in a central, customized database and users 320

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE JUN 2011 3. DATES COVERED 2. REPORT TYPE 00-00-2011 to 00-00-2011 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER A Web-based High-Throughput Tool for Next-Generation Sequence Annotation 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) U.S. Army Medical Research and Materiel Command,Biotechnology High Performance Computing Software Applications Institute,Telemedicine and Advanced Technology Research Center,Fort Detrick,MD,21702 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES Presented at the 2011 DoD High Performance Computing Modernization Program Users Group Conference, 20-24 June, Portland, OR, pgs 320-326

14. ABSTRACT The availability of a large number of genome sequences, resulting from inexpensive, high-throughput next-generation sequencing platforms, has created the need for an integrated, fully-automated, rapid, and high-throughput annotation capability that is also easy-to-use. Here, we present a web-based software application, Annotation of Genome Sequences (AGeS), which incorporates publicly-available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. The current version of AGeS provides annotations for bacterial genome sequences, and serves as a readily-accessible resource to Department of Defense (DoD) scientists for storing, annotating and visualizing genomes of newly-sequenced pathogens of interest. The AGeS system is composed of two major components. The first component is a web-based application that provides a graphical user interface for managing users? input genomes, submitting annotation jobs, and visualizing results. Sequence contigs are uploaded as a multi-FASTA input file and submitted for annotation, and the resulting annotations are visualized through GBrowse. The input genome sequences and the annotation results are stored in a secure, customized database. The second component is a high-throughput annotation pipeline for finding the genomic regions that code for proteins, RNAs and other genomic elements through a Do-It-Yourself Annotation framework. The pipeline also functionally annotates the protein-coding regions using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation. The annotation pipeline has been deployed on the Mana Linux cluster at the Maui High Performance Computing Center. The two components are connected together using the DoD user interface toolkit application programming interface. The AGeS system was evaluated for scaling of its parallel execution and annotation performance. AGeS scaled with super-linear speedup for up to 128 processors, after which performance degraded. A 2.2-Mbp bacterial genome sequence can be annotated in 1 hr using 128 processors. AGeS annotations of draft and complete genomes were compared with the original annotations from three different sources, and were found to be in general agreement with them. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: a. REPORT b. ABSTRACT c. THIS PAGE unclassified unclassified unclassified 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES Same as Report (SAR) 7 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

can visualize the annotations via easy-to-use graphical user interfaces (GUIs). The visualization of annotated sequences is presented using the open-source genome browser GBrowse[3]. The integrated software pipeline analyzes contig sequences, and locates genomic regions that code for proteins, RNAs, and other genomic elements through a Do-It-Yourself Annotation (DIYA) framework[4] and Tandem Repeats Finder (TRF)[5]. The identified protein-coding regions are then functionally annotated using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation (PIPA)[6]. All of these capabilities are available for bacterial genomes. Overall, AGeS provides the functionalities to: 1) store input sequences and annotated sequence data, 2) annotate completed and draft bacterial genomes in a fully-integrated and automated manner, 3) use high performance computing (HPC) for high-throughput annotation through efficient parallelization of the various publicly-available and in-house-developed bioinformatics resources, 4) visualize annotations using the familiar GBrowse[3] interface, and 5) download annotated genomes in GenBank[7] format. Several software systems have recently been developed for high-quality, automated annotation of bacterial genomes. These include BASys[8], RAST[9], and Microbial Genomes Database Web resources[10] as well as annotation services provided by some of the large genomic annotation centers, such as the Annotation Engine at the J. Craig Venter Institute ion-service/), the Genoscope’s annotation service MicroScope[11], and the Microbial Annotation Pipeline at Integrated Microbial Genomes[12]. However, these systems or services do not provide integrated, fully-automated, rapid, high-throughput, and readily-available capability, and some of the important features, such as mapping to standard Gene Ontology (GO) annotation[13], are also missing. Although most annotation systems contain components that are based on publicly-available bioinformatics programs and databases, integration of these components into pipelines is not a trivial task for researchers without significant bioinformatics and computer science expertise. While recently published DIYA[4] and the Genome Reverse Compiler[14] provide integrated software packages for genome annotation, they do not enable the full use of parallel computing and lack fully-integrated and automated visualization of annotations. 2. Methods and Implementation The AGeS system is composed of two main components: a Web-based application that provides user-friendly GUIs accessible via a standard Web browser; and a high-throughput software pipeline for the annotation of input genome sequences. Figure 1 shows the overall system architecture of the AGeS system. The AGeS Web application has been designed to control all aspects of the annotation process, i.e., input sequence management for uploading and manipulating genomic sequences, submitting annotation jobs to the AGeS annotation pipeline at an HPC cluster, storing input sequences as well as annotation results into a central relational database management system (RDBMS), and visualizing the annotations in the integrated GBrowse genome browser. For uploading the genome sequences, along with the required genus, species, and strain information, users have the option to upload the data pertinent to the minimum information about a genomic sequence (MIGS)[15]. Internally, the AGeS Web application uses a workflow manager module to guide the entire lifecycle of the annotation process, starting from the upload of an input sequence and ending with the visualization of the annotated sequences. 2.1 Web Application The AGeS system is accessible at https://applications.bioanalysis.org/ages/, and is available to the Department of Defense (DoD) Supercomputing Resource Centers (DSRCs) users for genome sequence annotation using a standard Web browser. The AGeS Web application has been designed as a modular application for the easy integration of future sequence analysis modules, as they become available, and uses a workflow manager to invoke its modules. Resource-intensive annotation tools are run on the Mana Linux cluster at the Maui HPC Center (MHPCC), which is accessed by the Web application using the DoD User Interface Toolkit (UIT) application programming interface (API) (https://www.uit.hpc. mil/). UIT is a Web service-based API that provides secure access to DoD HPC resources. AGeS users are authenticated through the UIT API using their Kerberos credentials. The AGeS Web application provides GUIs for managing sequences, submitting annotation jobs to the HPC cluster, and visualizing and downloading the annotation results. Figure 2 shows a screenshot of the AGeS Web application, showing the sequence management GUI. When an annotation job is completed on the HPC end, the results are automatically transferred back to the Web server and stored into the central database for visualization and download. Upon completion of an annotation job, an e-mail is also sent automatically to the user. The AGeS Web application was developed using standards-based technologies, which include Java (http://www.oracle. com/technetwork/java/), J2EE rview/), JavaServer Faces (JSF) (http:// aces-139869.html), asynchronous JavaScript and XML (AJAX)[16], 321

ICEfaces (http://www.icefaces.org/), jBPM (http://www.jboss.org/jbpm/), and Apache ActiveMQ (http://activemq.apache. org/). The Web application mainly consists of server-side Java codes that use JSF- and AJAX-based APIs from ICEfaces. ICEfaces provides a rich set of user interface components, such as menus, buttons, etc., and generates updated views of Webpages without reloading the entire page. The workflow manager module has been implemented, within the Web application, using the jBPM workflow engine API for controlling the execution of various modules. The Web application uses an Apache ActiveMQ server for asynchronous message passing between the modules and the workflow engine. A PostgreSQL (http://www.postgresql.org/) RDBMS server is used to store users’ input genome sequences, annotation results, and other job-related data. The Web application is deployed on an Apache Tomcat (http://tomcat.apache.org/) server, using a secure hypertext transfer protocol over a secure socket layer connection for encrypting all of the data flowing to and from the user’s Web browser. Figure 1. Overall system architecture for the Annotation of Genome Sequences (AGeS) system Figure 2. The AGeS graphical user interface used for sequence data management 322

2.2 Annotation Pipeline As shown in Figure 1, the AGeS annotation pipeline is composed of three modules for gene, tandem repeats, and protein function annotations. The annotation pipeline takes assembled contiguous sequences, or contigs, as input in multi-FASTA format files generated by high-throughput, next-generation sequencing technologies (http://www.454.com/, http://www. illumina.com/, and http://www.appliedbiosystems.com/). First, a customized DIYA[4] framework is used to locate proteincoding genes using Glimmer[17] and RNA genes using RNAmmer[18] and tRNAscan-SE[19]. Within the DIYA framework, the system uses BLAST[20] searches to extract coding regions from the Glimmer predictions, and to infer gene products by transferring annotation from the best BLAST match. Next, the system finds tandem repeats in the pseudo-assembled sequence using TRF[5]. Outputs from the different DIYA component programs and TRF are post-processed and parsed to generate a file in GenBank format. After annotation of the genomic regions is complete, the identified protein-coding regions are annotated using the high-throughput protein function annotation methods implemented in PIPA[6]. One of the most useful features of PIPA is that it exploits and consistently consolidates protein function information from disparate sources, including the in-housedeveloped CatFam enzyme profile database[21]. As an added benefit, the consolidated function predictions are given in GO terms, which is the de facto standard for protein annotation. The protein annotation results from PIPA are included in the GenBank file from the previous step, and are transferred back to the AGeS Web application for storage into the central database. 3. Results AGeS provides the capability to annotate whole bacterial genomes, including both genomic features and protein functions. The annotation pipeline that has been deployed on the Mana Linux cluster at the MHPCC scales well and is suited for whole genome sequence annotation. In this section, we present the results of the parallel processing performance testing of AGeS as well as of the software validation experiments. 3.1 Parallel Performance To assess the scalability of the parallelization of the annotation modules of the AGeS pipeline, we computed the speedup curve for the annotation of a typical bacterial genome (Figure 3). Speedup is defined as the ratio of the time taken by a program to run on N processors to the time taken to run the same program on a single processor, with an ideal speedup being linear, meaning that the speedup is directly proportional to the number of processors. AGeS achieves superlinear speedup for up to 128 processors, after which its performance declines. The super-linear speedup is attributed to faster processing achieved by fully using the processors’ local memory, and the speedup decline beyond 128 processors is attributed to communication overhead. A 2.2-Mbp bacterial genome sequence (e.g., Staphylococcus hominis SK119, which is an opportunistic pathogen in patients with a compromised immune system) can be annotated in 1 hr using 128 processors. Figure 3. AGeS performance speedup as a function of the number of processors 323

3.2 Software Validation We validated AGeS by comparing its annotations of bacterial genomes with annotations from three other sources. We evaluated two draft genomes, Staphylococcus hominis SK119 and Staphylococcus aureus subsp. aureus TCH60, and one completed genome, Yersinia pestis CO92. The S. hominis draft genome, sequenced by J. Craig Venter Institute (http:// mental-genomics/), consists of 37 contigs, and the S. aureus draft genome, sequenced by the Human Genome Sequencing Center at Baylor College of Medicine (http://www.hgsc.bcm.tmc. edu/), consists of 65 contigs. Both of these draft genomes were sequenced using 454 pyrosequencing technology (http:// www.454.com/). The complete Y. pestis genome was sequenced by the Wellcome Trust Sanger Institute (http://www. .html) using Sanger sequencing technology. The annotations for these three genomes were retrieved from the corresponding sequencing centers, and their sequences were re-annotated using the AGeS system. Figure 4A shows a subset of the compared genomic features[2]. The total number of annotated genes for each of these genomes was compared with the original annotations provided by the corresponding centers. Each of the two compared annotation sources predicted similar numbers of genes. For S. hominis (Sh), we found that 1,753 ( 78%) genes were identical across both predictions. Most of the remaining genes overlapped at the start or end positions, with only 0.2% of the predictions unique to AGeS (data not shown). For the S. aureus (Sa) genome, 2,037 ( 77%) genes were identical, with only 1% of the predictions unique to AGeS (data not shown). For the Y. pestis (Yp) genome, 2,637 ( 60%) genes were identical across the 2 annotations, and another 30% had identical start or end positions (data not shown). Annotation comparisons indicated larger differences for the Y. pestis completed genome than for the two draft genomes. These differences could be attributed to the more extensive studies performed in this well-studied genome. A similar level of agreement was observed for other genomic features, such as CDSs, rRNAs, and tRNAs. Figure 4. Comparison of gene annotations and enzyme function predictions between AGeS and the other three annotation systems for the three analyzed genomes, Staphylococcus hominis SK119 (Sh), Staphylococcus aureus subsp. aureus TCH60 (Sa), and Yersinia pestis CO92 (Yp). A: the number of genes predicted by the original annotation centers and AGeS, with the overlap corresponding to identical predictions. B: the number of enzymes predicted by the original annotation centers and AGeS, with the overlap corresponding to identical predictions. We also compared the annotations of the enzyme functions predicted by the CatFam enzyme profile database with those provided by the other three annotation centers. Figure 4B shows the similar numbers of annotated enzymes for each of the three compared genomes[2]. For example, for the S. hominis (Sh) draft genome, CatFam assigned Enzyme Commission (EC) numbers for 515 genes, whereas the J. Craig Venter Institute assigned EC numbers to 565 genes, with 379 enzymes having identical EC number annotations. In general, our results indicate that the AGeS annotations are in agreement with the other evaluated methods both on the genomic and proteomic annotation levels. 4. Conclusion The Web-based AGeS system described in this paper is a computationally-efficient and scalable system for highthroughput genome annotation of newly sequenced pathogens of military relevance and their near neighbors. The AGeS annotation pipeline is fully-parallelized and is currently operational at the Mana Linux cluster at the MHPCC, where we performed scalability tests and found that a 2.2-Mbp bacterial genome sequence can be annotated in 1 hr using 324

128 processors. Validation results indicated that the AGeS system’s annotations are in general agreement with the other evaluated methods, both on the genomic and proteomic annotation levels. Due to significant cost reductions afforded by the recently developed next-generation genome sequencing technologies, we expect that software applications such as AGeS will become vital for microbial comparative genomics studies. Acknowledgements This work was partially sponsored by the US DoD High Performance Computing Modernization Program, under the High Performance Computing Software Applications Institutes Initiative. Disclaimer The opinions and assertions contained herein are the private views of the authors, and are not to be construed as official or as reflecting the views of the US Army or of the US Department of Defense. References 1. Hall, N., “Advanced sequencing technologies and their wider impact in microbiology”, The Journal of Experimental Biology, 210(9), pp. 1518–1525, 2007. 2. Kumar, K., V. Desai, L. Cheng, M. Khitrov, D. Grover, R.V. Satya, C. Yu, N. Zavaljevski, and J. Reifman, “AGeS, a software system for microbial genome sequence annotation”, PLoS ONE, 6(3), e17469, 2011. 3. Donlin, M.J., “Using the Generic Genome Browser (GBrowse)”, Current Protocols in Bioinformatics, Chapter 9, pp. 9.9.1–25, 2009. 4. Stewart, A.C., B. Osborne, and T.D. Read, “DIYA, a bacterial annotation pipeline for any genomics lab”, Bioinformatics, 25(7), pp. 962–963, 2009. 5. Benson, G., “Tandem repeats finder, a program to analyze DNA sequences”, Nucleic Acids Research, 27(2), pp. 573–580, 1999. 6. Yu, C., N. Zavaljevski, V. Desai, S. Johnson, F.J. Stevens, and J. Reifman, “The development of PIPA, an integrated and automated pipeline for genome–wide protein function annotation”, BMC Bioinformatics, 9, 52, 2008. 7. Benson, D.A., I. Karsch–Mizrachi, D.J. Lipman, J. Ostell, and E. W. Sayers, “GenBank”, Nucleic Acids Research, 38(suppl 1), pp. D46–D51, 2010. 8. Van Domselaar, G.H., P. Stothard, S. Shrivastava, J.A. Cruz, A. Guo, X. Dong, P. Lu, D. Szafron, R. Greiner, and D.S. Wishart, “BASys, a web server for automated bacterial genome annotation”, Nucleic Acids Research, 33(suppl 2), pp. W455–W459, 2005. 9. Aziz, R.K., D. Bartels, A.A. Best, M. DeJongh, T. Disz, R.A. Edwards, K. Formsma, S. Gerdes, E.M. Glass, M. Kubal, F. Meyer, G.J. Olsen, R. Olson, A.L. Osterman, R.A. Overbeek, L.K. McNeil, D. Paarmann, T. Paczian, B. Parrello, G.D. Pusch, C. Reich, R. Stevens, O. Vassieva, V. Vonstein, A. Wilke, and O. Zagnitko, “The RAST Server, rapid annotations using subsystems technology”, BMC Genomics, 9, 75, 2008. 10. Uchiyama, I., T. Higuchi, and M. Kawai, “MBGD update 2010, toward a comprehensive resource for exploring microbial genome diversity”, Nucleic Acids Research, 38(suppl 1), pp. D361–D365, 2010. 11. Vallenet, D., S. Engelen, D. Mornico, S. Cruveiller, L. Fleury, A. Lajus, Z. Rouy, D. Roche, G. Salvignol, C. Scarpelli, and C. Médigue, “MicroScope, a platform for microbial genome annotation and comparative genomics”, Database, 2009, 2009. 12. Markowitz, V.M., I.–M. A. Chen, K. Palaniappan, K. Chu, E. Szeto, Y. Grechkin, A. Ratner, I. Anderson, A. Lykidis, K. Mavromatis, N.N. Ivanova, and N.C. Kyrpides, “The integrated microbial genomes system, an expanding comparative analysis resource”, Nucleic Acids Research, 38(suppl 1), pp. D382–D390, 2010. 13. Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel–Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, “Gene ontology, tool for the unification of biology”, Nature Genetics, 25(1), pp. 25–29, 2000. 14. Warren, A. S. and J. C. Setubal, “The Genome Reverse Compiler, an explorative annotation tool”, BMC Bioinformatics, 10, 35, 2009. 15. Field, D., G. Garrity, T. Gray, N. Morrison, J. Selengut, P. Sterk, T. Tatusova, N. Thomson, M.J. Allen, S.V. Angiuoli, M. Ashburner, N. Axelrod, S. Baldauf, S. Ballard, J. Boore, G. Cochrane, J. Cole, P. Dawyndt, P. De Vos, C. DePamphilis, R. Edwards, N. Faruque, R. Feldman, J. Gilbert, P. Gilna, F. O. Glockner, P. Goldstein, R. Guralnick, D. Haft, D. Hancock, H. Hermjakob, C. Hertz–Fowler, P. Hugenholtz, I. Joint, L. Kagan, M. Kane, J. Kennedy, G. Kowalchuk, R. Kottmann, E. Kolker, S. Kravitz, N. Kyrpides, J. Leebens–Mack, S.E. Lewis, K. Li, A.L. Lister, P. Lord, N. Maltsev, V. Markowitz, J. Martiny, B. Methe, I. Mizrachi, R. Moxon, K. Nelson, J. Parkhill, L. Proctor, O. White, S. A. Sansone, A. Spiers, R. Stevens, P. Swift, C. Taylor, Y. Tateno, A. Tett, S. Turner, D. Ussery, B. Vaughan, N. Ward, T. Whetzel, I. San Gil, G. Wilson, and A. Wipat, “The minimum information about a genome sequence (MIGS) specification”, Nature Biotechnology, 26(5), pp. 541–547, 2008. 325

16. Paulson, L.D., “Building Rich Web Applications with Ajax”, Computer, 38(10), pp. 14–17, 2005. 17. Delcher, A.L., K.A. Bratke, E.C. Powers, and S.L. Salzberg, “Identifying bacterial genes and endosymbiont DNA with Glimmer”, Bioinformatics, 23(6), pp. 673–679, 2007. 18. Lagesen, K., P. Hallin, E.A. Rødland, H.–H. Stærfeldt, T. Rognes, and D.W. Ussery, “RNAmmer, consistent and rapid annotation of ribosomal RNA genes”, Nucleic Acids Research, 35(9), pp. 3100–3108, 2007. 19. Lowe, T.M. and S.R. Eddy, “tRNAscan–SE, a program for improved detection of transfer RNA genes in genomic sequence”, Nucleic Acids Research, 25(5), pp. 0955–0964, 1997. 20. Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic local alignment search tool”, Journal of Molecular Biology, 215(3), pp. 403–410, 1990. 21. Yu, C., N. Zavaljevski, V. Desai, and J. Reifman, “Genome–wide enzyme annotation with precision control, catalytic families (CatFam) databases”, Proteins, 74(2), pp. 449–460, 2009. 326

protein-coding regions using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation. The annotation pipeline has been deployed on the Mana Linux cluster at the Maui High Performance Computing Center. The two components are connected together using the DoD user interface toolkit application programming interface.

Related Documents:

Track 'n Trade High Finance Manual - Gecko Software

6 Track 'n Trade High Finance Chapter 4: Charting Tools 65 Introduction 67 Crosshair Tool 67 Line Tool 69 Multi-Line Tool 7 Arc Tool 7 Day Offset Tool 77 Tool 80 Head & Shoulders Tool 8 Dart/Blip Tool 86 Wedge and Triangle Tool 90 Trend Fan Tool 9 Trend Channel Tool 96 Horizontal Channel Tool 98 N% Tool 00

22 Views

11m ago

˜e Adobe Illustrator® CHEAT SHEET - Shortgrass

e Adobe Illustrator CHEAT SHEET. Direct Selection Tool (A) Lasso Tool (Q) Type Tool (T) Rectangle Tool (M) Pencil Tool (N) Eraser Tool (Shi E) Scale Tool (S) Free Transform Tool (E) Perspective Grid Tool (Shi P) Gradient Tool (G) Blend Tool (W) Column Graph Tool (J) Slice Tool (Shi K) Zoom Tool (Z) Stroke Color

31 Views

1y ago

Migrate from Cisco ACE to a Next- Generation Load Balancing Solution ...

Table 1. Cisco ACE to Avi Networks Cisco CSP 2100 Existing Cisco Model Migration to Cisco CSP Avi Vantage Ace 4710 Throughput: 0.5, 1, 2, 4 Gbps SSL Throughput: 1 Gbps SSL TPS: 7,500 Cisco CSP 4-core Avi SE Throughput: 20 Gbps SSL Throughput: 4 Gbps SSL TPS: 8,000 Ace 30 Service Module Throughput: 4, 8, 16 Gbps

56 Views

1y ago

SonicWALL Network Security Appliance (NSA) Series - Medialine

NSA 5600 firewall only 01-SSC-3830 NSA 5600 TotalSecure (1-year) 01-SSC-3833 Firewall NSA 6600 Firewall throughput 12.0 Gbps IPS throughput 4.5 Gbps Anti-malware throughput 3.0 Gbps Full DPI throughput 3.0 Gbps IMIX throughput 3.5 Gbps Maximum DPI connections 500,000 New connections/sec 90,000/sec Description SKU NSA 6600 firewall only 01-SSC-3820

8 Views

4m ago

FortiGate 1500D Series Data Sheet

SSL-VPN Throughput 4 Gbps Concurrent SSL-VPN Users (Recommended Maximum, Tunnel Mode) 10,000 SSL Inspection Throughput (IPS, avg. HTTPS) 3 5.7 Gbps SSL Inspection CPS (IPS, avg. HTTPS) 3 3,100 SSL Inspection Concurrent Session (IPS, avg. HTTPS) 3 800,000 Application Control Throughput (HTTP 64K) 2 16 Gbps CAPWAP Throughput (1444 byte, UDP) 20 Gbps

25 Views

2y ago

Back to Basics TOC: Throughput Accounting

Basics of Throughput Accounting Throughput The rate at which the system produces goal units. In business, this is the rate our system produces net, new dollars (euros, pesos, moneyessentially). Throughput can also be viewed as the value our organizations generate.

19 Views

2y ago

ExpertTCP - TCP Throughput Testing - GL

Network Throughput Latency Packet Loss Back-to-Back Jitter End-to-End Throughput. 5 Typical SLA . Hard drive, 8G Memory (Min), Windows 10 64-bit Pro OS, USB 2.0 or 3.0 Ports, ATX Power Supply. 19" 1U Rackmount Enclosure (If options, then x 3). . during the actual TCP Throughput test compared to the Baseline RTT. 40 .

7 Views

1y ago

Lab 13: Impact of MSS on Throughput - University of South Carolina

Figure 13. iPerf3 throughput test. Note the measured throughput now is approximately 7.99 Gbps, which is different than the value assigned in the tbf rule (10 Gbps). In the next section, the test is repeated but using a higher MSS. Step 6. In order to stop the server, press Ctrl c in host h2's terminal. The user can see the throughput results .

7 Views

1y ago

Recent Views

TENTH EDITION self-therapy for the stutterer

Stuttering Foundation of America self-therapy for the stutterer TENTH EDITION THE STUTTERING FOUNDATION PUBLICATION NO. 0012 self-therapy for the stutterer Publication No. 0012 First Edition—1978 Tenth Edition—2002 Revised Tenth Edition—2007 Published by Stuttering Foundation of America 3100 Walnut Grove Road, Suite 603 P.O. Box 11749 Memphis, Tennessee 38111-0749 Library of Congress .

3y ago

40 Views

Supply Chain Management: An International Journal

The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. *Related content and download information correct at time of download. Downloaded by University of Nottingham At 06:12 31 October 2018 (PT) Modern slavery challenges to supply chain management Stefan Gold International Centre for .

3y ago

29 Views

Operation London Bridge - Fremington Parish Council

OPERATION LONDON BRIDGE . 1 CONTENTS Page 2 – 1. Introduction Page 3 – 2. Protocol Page 3 – 2.1 Implementation of Protocol Page 3 – 3. Flag Flying Page 3 – 4. Proclamation Day Schedule Page 4 – 4.1 Proclamation Day Page 4 – 4.2 Proclamation Day Protocol Page 5 – 5. Books of Condolence Page 6 – 5.1 Online Book of Condolence Page 6 – 6. Events During the Period of Mourning .

3y ago

62 Views

A CONTINUUM OF QUALITY: ON FIRE

ASTM D 5132 BSS 7230 MODEL 701-S MODEL 701-S-X (export) MODEL VC-1 MODEL VC-1-X (export) MODEL VC-2 MODEL VC-2-X (export) MODEL HC-1 MODEL HC-1-X (export) MODEL HC-2 MODEL HC-2-X (export) FAA Listed TM. FAA MULTI-PURPOSE SMALL SCALE FLAMMABILITY TESTER SPECIFICATIONS: FAR Part 25 Appendix F Part I (Vertical, Horizontal, 45 and 60 ) DRAPERY FLAMMABILITY The most widely cited .

3y ago

80 Views

Combustion Analysis of Nanoenergetic Materials

Osci 1 05 10 15 P a [MPa] Acc Osci. NEEM MURI Temperature Measurements for understanding Gas Generation Previous work: gas fraction at equilibrium Drawbacks: No intermediate gases (not present at equilibrium) nAl/MoO 3 30 Many of the equilibrium gases will not be realized until very high temperatures (ex. Cu: BP of 2835K) nAl/CuO in burn tube at 10 20 e ssure [MPa] 1atm in air nAl/MoO .

3y ago

37 Views

Wiring and testing electrical equipment and circuits

circuits to occur, strain on terminations, insufficient slack cable at terminations, continuity and polarity checks, insulation checks) K21 the care, handling and application of electrical test and measuring instruments (such as multimeter, insulation resistance tester, loop impedance test instruments) K22 applying approved test procedures; the safe working practices and procedures required .

3y ago

46 Views

GRID DIP METER DESIGN - makearadio

circuits). 2. Rough frequency and harmonic measurements 3. AM signal monitor receiver. 4. Simple RF signal generator including AM modulation if required. 5. Crystal Testing. 6. Use as a BFO for SSB and CW reception 7. Measurement of unknown capacitors and inductors I decided to include some extra features above the normal in functionality RF output from the oscillator enabling use of an .

3y ago

208 Views

OPHTHALMOLOGY GOALS AND OBJECTIVES

The objectives of Ophthalmology Residency Program are to: 1. Provide residents with a strong scientific understanding of the fundamentals of ophthalmology through a combination of mentoring and didactic education. 2. Provide residents with clinical skills in all subspecialties of ophthalmology. 3.

3y ago

60 Views

History of Computers

An analog computer does not store information digitally Values are stored as voltage levels Analog computers are particularly useful solving nonlinear simultaneous differential equations An electric circuit can be defined by an equation. An analog computer is programmed by creating a circuit that follows a desired equation.

3y ago

37 Views

Risk Management and Corporate Governance - OECD

Corporate Governance Risk Management and Corporate Governance Volume 2011/Number of issue,Year of edition Author (affiliation or title), Editor Tagline Groupe de travail/Programme (ligne avec top à 220 mm)

3y ago

66 Views

RF Design and Test Using MATLAB and NI Tools

RF Design and Test Using MATLAB and NI Tools . Antenna array, RF, and digital signal processing cannot be designed separately! – Large communication bandwidth digital signal processing is challenging – High-throughput DSP linearity requirements imposed over large bandwidth

3y ago

87 Views

Digital Signal Processing - Webspaces - Accueil

J.-P. Delmas et al. / Digital Signal Processing 95 (2019) 102579. lower far-ﬁeld DOA CRB. Furthermore, thanks to the decoupling be-tween the DOA and range parameters to the second-order w.r.t. the inverse of the range in the Fisher information matrix, the deriva-tion of closed-form approximate expressions of the CRB is greatly simpliﬁed.

3y ago

23 Views

History of U.S. Children’s Policy, 1900-Present

Social dislocations of the late 19th century, sparked by rapid industrialization, population growth, urbanization, and immigration, together with the economic crises of the late 1870s and 1890s, led to social reform movements in the 1890s and during the Progressive Era at the beginning of the 20th century. With respect to children, many reformers

3y ago

53 Views

EDUKASYONG PANGKATAWAN 5 Lesson Exemplars Karapatang Ari .

nakasaad sa ilalim ng makabagong kurikulum, ang K to 12 Currriculum. Layunin nito na mabigyan ng sapat na kaalaman at pagpapahalaga sa mga gawaing may kinalaman sa pagpapaunlad ng pangangatawan. Sa paghahanda ng mga aralin na nakapaloob sa exemplar na ito, isinasaalang-alang ang mga sumusunod na pangunahing kaisipan:

3y ago

99 Views

ELECTRICAL ENGINEERING GRADUATE

Electrical Engineering, or is not equivalent to the BSEE degree offered by Cal State LA, we may require you to complete certain prerequisite courses before being admitted to our program. These will normally be 300level courses, though the list mig0- ht contain a number of 2 or 400000-0-

3y ago

30 Views

Web-based High-Throughput Tool For Next-Generation Sequence Annotation

It looks like you're using an ad-blocker