I AM NOT A METAGENOMIC EXPERT I Am Merely The MESSENGER - CGIAR

1y ago
5 Views
2 Downloads
7.62 MB
30 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Eli Jorgenson
Transcription

I AM NOT A METAGENOMIC EXPERTI am merely the MESSENGERBlaise T.F. Alako, PhDEBI Ambassadorblaise@ebi.ac.uk

Hubert DeniseAlex MitchellPeter SterkSarah Hunter

http://www.ebi.ac.uk/metagenomicsBlaise T.F. AlakoEBI Ambassadorblaise@ebi.ac.uk

Where is the true cost of NGS ?14.5 %30 %28 %70 %( 80 bp/ )( 2m bp/ )4.5 %14.5 %55 %36.5 %14.5 %Sboner et al. Genome Biology (2011) 12:125

EBI Metagenomics pipelineData analysis usingselected EBI andexternal software tools§ § PhilosophyOverview data analysis§ § § § QC steps tutorialOverview of functional analysisResult outputsOthers public pipelines

Philosophy behind EBI Metagenomics pipelineHelping metagenomics researchers make sense of their dataFrom chaos to structure:§ archiving of data with metadata§ performing stringent QC filtering prior to analysis§ quality in, quality out§ performing robust taxonomy and functional analysis§ model-based rather than similarity-based approaches§ assignment done on reads rather than assembly§ intuitive navigation through website§ constant drive to improvement§ benchmarking and tool testing

EBI Metagenomics currently do not perform assemblyWhy ?§ absence of reference genome§ short reads make chimaera inevitableEx: re-analysis of Hess et al, Science (2011) 331:463What are the consequences ?§ cannot link taxonomy information to functional annotations§ cannot currently perform viral taxonomy analysis

Metagenomics data analysisDiversity analysisQuality controlFunctional analysisImage credits:(1) Christina Toft & Siv G. E. Andersson; (2) Dalebroux Z D et al. Microbiol. Mol. Biol. Rev. 2010;74:171-199

Overview of EBI Metagenomics Pipelineraw rdedreads thatfail QCAmplicon-based ssignmentUnknownfunctionpCDS

EBI Metagenomics pipelineData analysis usingselected EBI andexternal software tools§ § PhilosophyOverview data analysis§ § QC steps tutorialOverview of functional analysis§ Result outputs§ Others public pipelines

EBI Metagenomics: QC rationaleWhy ?§ Garbage in, garbage out§ Base call error: - each base call has a quality score associated- platform-dependent errors§ Reads quality decreases with reads length§ NGS generates duplicate reads (false and real). Reducingduplication reduces analysis time and prevent analysis bias.

EBI Metagenomics: QC step by step§ Clipping - low quality ends trimmed and adapter/barcode sequencesremoved using Biopython SeqIO package§ Quality filtering - sequences with 10% undetermined nucleotides removed§ Read length filtering - depending on the platform short sequences are removed§ Duplicate sequences removal - clustered on 99% identity (UCLUST v 1.1.579) andrepresentative sequence chosen§ Repeat masking - RepeatMasker (open-3.2.2), removed reads with 50% or morenucleotides masked

EBI Metagenomics: QC consequencesRoche 454Ion TorrentIllumina

EBI Metagenomics: overview of functional terProScanFunctionassignmentUnknownfunctionpCDS

EBI Metagenomics: identification of coding sequencesPrediction of coding sequences is a challenge§ read length§ sequencing errors: frame-shiftTwo main types of approaches:§ homology-based methods: identify only known coding sequences§ feature-based approaches: predict probability that ORF are codingEBI Metagenomics uses FragGeneScan :§ hidden Markov models to correct frame-shift using codon usage§ probabilistic identification of start and stop codons§ 60 bp minimum ORFRho et al. (2010) NAR 38-20

EBI Metagenomics: annotation of coding sequencesMost available pipelines use homology-based methods (such as BLAST)§ compare a query sequence with a database of sequences§ identify database sequences that resemble the query sequence withhomology score above a certain thresholdHowever sequences may appear to have low homology score because:§ proteins may share homology only in limited domains§ proteins from different species can differ in lengthEBI Metagenomics pipeline do not use pairwise similarity basedmethods to associate functions to predicted protein sequencesinstead we use InterProScan to mine the InterPro database

EBI Metagenomics: Avantage of InterProInterPro database (HMM and profile –based functional analysis)§ based on presence of “signatures” (models) from several databases§ Specificity: mapping is manually curatedBLAST vs. UniRef100 hit InterProScan hitC7VBM8, Predicted protein 5-formyltetrahydrofolate cyclo-ligase-like (IPR024185)C7VC62, Predicted protein Transcription regulator HTH, LysR (IPR000847)§ SpeedTest set of 40692 predicted protein sequences§ BLAST vs UniRef100 21.5 s/cds§ InterProScan (5 databases) 3 s/cds

EBI Metagenomics: overview of taxonomy licon-based dataQiimeTaxonomicanalysis

EBI Metagenomics: identification of suitable sequencesTaxonomy analysis is generally based on identification and classification of rRNAsequences§ Prokaryotes: archaebacteria and eubacteria: 5S, 16S and 23S§ Eukaryotes: 5S, 5.8S, 18S and 28S§ there is no equivalent for virus so depend on DNA polymerase or part of5’-UTR (internal ribosomal entry site [IRES]) sequencesEBI Metagenomics currently only provide taxonomy analysis for Prokaryotes.rRNA sequences are identified using rRNASelector :§ hidden Markov models to identified rRNA sequences§ 60 bp minimum overlap with well-curated HMM model§ E-value 10-5Lee et al (2011) J Microbiol. 49(4)

EBI Metagenomics: identification of suitable sequencesOnce identified, rRNA sequences are clustered and classified using Qiime“QIIME stands for Quantitative Insights Into Microbial Ecology. QIIME is an opensource software package for comparison and analysis of microbial communities”The main steps are:§ clustering sequences in Operational Taxonomy Unit (OTU) using uclust§ picking a representative sequence set (one sequence from each OTU)§ aligning the representative sequence set§ assigning taxonomy to the representative sequence set using PyNAST§ generating output files:§ filtering the alignment prior to tree building§ building phylogenetic tree§ creating OTU table

EBI Metagenomics pipeline in a nut shell§ QC :- trim adaptor sequences, low quality sequence ends- remove duplicates and short sequences- remove low complexity sequences,“Powerful and sophisticated alternative to BLAST-based functional metagenomicanalysis”§ Diversity analysis :- identify prokaryotic rRNAsequences (5, 16 and 23s)- cluster rRNA-containing reads- assign taxonomy classification using Qiime,§ Functional analysis :- predict ORFs- translate ORFs into peptides- submit to InterProScan for functional annotation

EBI Metagenomics pipelineData analysis usingselected EBI andexternal software tools§ § PhilosophyOverview data analysis§ § § § QC steps tutorialOverview of functional analysisOverview of taxonomy analysisResult outputs§ Others public pipelines

Current outputs of EBI Metagenomics pipelineVisualisationDownload- QC and sequence statistics- Diversity analysis- Functional analysis

EBI Metagenomics pipeline: taxonomy visualisationGoogle chartsdynamicrepresentationswitch to bar chart,column or Kronainteractive views

EBI Metagenomics pipeline: functional visualisationGoogle chartsdynamicrepresentationInterpro matcherslinks to InterPro websiteGene ontology

EBI Metagenomics pipeline : download optionsLarge starting materialSmall size output for post-processing

EBI Metagenomics pipelineData analysis usingselected EBI andexternal software tools§ § PhilosophyOverview data analysis§ § § § QC steps tutorialOverview of functional analysisOverview of taxonomy analysisResult outputs§ Others public pipelines

Some other Metagenomics mlhttp://cbcb.umd.edu/software/metAMOS

Public Metagenomics /img.jgi.doe.gov/

http://www.ebi.ac.uk/metagenomicsThanks to EMG Team, InterPro team and you for your attention

EBI Metagenomics: identification of coding sequences Prediction of coding sequences is a challenge ! read length ! sequencing errors: frame-shift Two main types of approaches: ! homology-based methods: identify only known coding sequences feature-based approaches: predict probability that ORF are coding EBI Metagenomics uses FragGeneScan:

Related Documents:

Texts of Wow Rosh Hashana II 5780 - Congregation Shearith Israel, Atlanta Georgia Wow ׳ג ׳א:׳א תישארב (א) ׃ץרֶָֽאָּהָּ תאֵֵ֥וְּ םִימִַׁ֖שַָּה תאֵֵ֥ םיקִִ֑לֹאֱ ארָָּ֣ Îָּ תישִִׁ֖ארֵ Îְּ(ב) חַורְָּ֣ו ם

a new, easy-to-use, extensible visualization and analysis software framework that facilitates the manipulation and interpretation of large amounts of metagenomic sequence data. The framework automatically performs an array of standard metagenomic analyses using FASTA files that contain 16S rRNA sequences as input.

Keywords Distributed computing, Bioinformatics, Grid computing, Algorithm, Workflow, Metagenomics, Workflow INTRODUCTION The analysis of metagenomic experiments, from next-generation sequencing, requires several stages: bases quality control, reads binning (optional), reads assemble, and taxonomic classification.

MEGAN analysis of metagenomic data Daniel H. Huson,1,3 Alexander F. Auch,1 Ji Qi,2 and Stephan C. Schuster2,3 1Center for Bioinformatics, Tübingen University, Sand 14, 72076 Tübingen, Germany; 2Center for Comparative Genomics and Bioinformatics, Center for Infectious Disease Dynamics, Penn S

Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid Steve Miller,1,2,10 Samia N. Naccache,1,2,3,10 Erik Samayoa,1 Kevin Messacar,4 Shaun Arevalo,1,2 Scot Federman,1,2 Doug Stryke,1,2 Elizabeth Pham,1 Becky Fung,1 William J. Bolosky,5 Danielle Ingebrigtsen,1 Walter Lorizio,1

A metagenomic library produced from the ileal mucosa-associated microbiota was used for this study and comprised E. coli recombinant fosmid clones with about 40 kb DNA inserts33. In order to explore the fibrolytic systems of these microorganisms, 20,000 metagenomic clones were screened for their

akuntansi musyarakah (sak no 106) Ayat tentang Musyarakah (Q.S. 39; 29) لًََّز ãَ åِاَ óِ îَخظَْ ó Þَْ ë Þٍجُزَِ ß ا äًَّ àَط لًَّجُرَ íَ åَ îظُِ Ûاَش

Collectively make tawbah to Allāh S so that you may acquire falāḥ [of this world and the Hereafter]. (24:31) The one who repents also becomes the beloved of Allāh S, Âَْ Èِﺑاﻮَّﺘﻟاَّﺐُّ ßُِ çﻪَّٰﻠﻟانَّاِ Verily, Allāh S loves those who are most repenting. (2:22