Tutorial - QIAGEN Bioinformatics

3y ago

50 Views

2 Downloads

1.04 MB

16 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Kairi Hasson

Report this link

Download PDF

Transcription

TutorialDe Novo Assembly Using Long Reads and ShortRead PolishingOctober 6, 2020Sample to InsightQIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmarkdigitalinsights.qiagen.com ts-bioinformatics@qiagen.com

2TutorialDe Novo Assembly Using Long Reads and Short Read PolishingThis tutorial is an introduction to working with the tools in the Long Read Support (beta) plugin.The Long Read Support (beta) plugin is a collection of tools developed for working with long,error-prone reads such as those produced by the single-molecule sequencing technologies ofPacific Biosciences or Oxford Nanopore Technologies. It is based on the open source componentsminimap2 [Li, 2018], miniasm [Li, 2016] and racon [Vaser et al., 2017]. The tutorial covers thefollowing: Import data required for the analysis. De novo assemble a microbial sized genome using long, error-prone reads. Improve a de novo assembly from long reads by polishing with short, high-quality reads. Map long reads to a reference and visualizing an assembly. Correct raw long reads for further analysis.Prerequisites For this tutorial, you must be working with CLC Genomics Workbench 20.0 orhigher and have installed the Long Read Support (beta) plugin.How to install plugins is described uals/clcgenomicsworkbench/current/index.php?manual Install.htmlOptional: For additional evaluation steps, you will need the Whole Genome Alignment plugin.Plugins can be downloaded from: ew/plugins/Download and import data1. Download the sample data from our web data/CAV1492 MinION and Illumina example data.zipand unzip it.2. Open the CLC Genomics Workbench.3. Create a new folder for the project. For example, titled "Long Read Tutorial".4. Import the Oxford Nanopore MinION reads.Nanopore. Click Add files and add the fileTo do so, select File Import OxfordS. Marcescens CAV1492-MinION.fastqLeave other settings as default. Click Next and Save it to the location you created.5. Open Import then click Illumina. Make sure to check Paired reads under General options.Click Add files and selectS. marcescens CAV1492-Illumina downsampled.R1.fastqS. marcescens CAV1492-Illumina downsampled.R2.fastqClick Next and Save to the same location as the MinION reads.

3Tutorial6. Lastly, import the reference S.marcescens CAV1492 genome.fausing Standard Import. Keep the default option Automatic import checked. Click Next andSave it to same location.Your folder should look like figure 1.Figure 1: Imported data listThe data for this tutorial is from a study examining the feasibility of using reads from OxfordNanopore to fully assemble and resolve bacterial genomes including plasmids. It contains longMinION reads and Illumina short read sequencing data from 8 Enterobacterales isolates of sixdifferent species [George et al., 2017].We will assemble the isolate from species Serratia marcescens strain CAV1492. This strain hasone chromosome and 5 plasmids. The sequencing data contains 7,039 MinION reads with anaverage read length of 12,660bp. We have also imported a set of Illumina reads sequenced fromthe same strain. These reads have been downsampled from 3 million to 300,000 paired readsto lower the runtime of analysis in this tutorial.Lastly, we have imported a reference standard made using deep coverage PacBio and paired-endsequencing. The reference standard can be found in BioProject PRJNA246471.De Novo Assemble Long ReadsThe De Novo Assemble Long Reads (beta) tool makes it possible to create de novo assembliesfrom long reads for microbial-sized genomes.To start the tool, locate De Novo Assemble Long Reads (beta) (Figure 2) in Toolbox or searchand run using Launch.Figure 2: Long Read Support (beta) tools1. Select the imported MinION reads (Figure 3) and click Next.2. Run the tool using the default settings as shown in Figure 4. Make sure the Polish withReads (beta) option is checked. This will make the tool run two rounds of polishing afterassembling. Click Next.3. Make sure Create report is checked to get a quick overview of your assembly. Then chooseto Save the assembly in a location of your choosing; we recommend making a subfoldertitled "De Novo Assemble Long Reads". Wait for the tool to run. Depending on your setup,this will take a few minutes.

4TutorialFigure 3: Selecting the MinION readsFigure 4: De Novo Assemble Long Reads (beta)options4. The genome of S. marcescens should now be assembled. You can locate the output andopen the Assembly report. Here, you can see an overview of your assembly includingnucleotide distribution and contig measurements. This dataset will assemble to 6 contigswith a total size of approximately 5.8mb (Figure 5).Figure 5: The contigs measurement after de novo assemblyOpen the contig list. In the assembly, you can see that 5 out of 6 contigs are circular asindicated by next to their name. One of the contigs is linear and does not match any ofthe plasmid lengths from the reference. We will return to this plasmid named CP011638later.

5Tutorial(Optional) Create a Whole Genome Alignment Since a reference is available for this data set,you can check the quality of your assembly. To do so, you need the Whole Genome Alignmentplugin as described in the introduction1. Run Create Whole Genome Alignment from Whole Genome Alignment (Figure 6) in theToolbox or use Launch to search and run.Figure 6: Whole Genome Alignment tools2. Select the imported reference genome and the contigs you just created (Figure 7). ClickNext.Figure 7: Select the two genomes to align3. Leave settings on default as shown (Figure 8).Figure 8: Whole Genome Alignment options

6Tutorial4. Choose to Save the alignment in your De Novo Assemble Long Reads location. Open thealignment to visualize your assembly against the reference (Figure 9).Figure 9: The whole genome alignment of the reference and assemblyThere is an overall good alignment except one contig where only about half aligns. If youhover over this contig, you can see that this is the same contig we identified as incorrectlybeing linear in the assembly.5. Lastly, you can calculate the Alignment Percentage (AP) and Average Nucleotide Identity(ANI). To do so run Create Average Nucleotide Identity Comparison from Whole GenomeAlignment in the Toolbox or use Launch to search and run.6. Select the whole genome alignment you created using Create Whole Genome Alignment(Figure 10) and click Next.Figure 10: Select the whole genome alignment7. Leave the settings on default as shown in figure 11 and click Next.8. Choose to Save the comparison in your De Novo Assemble Long Reads location. Open itto see how well your assembly matches the reference. In this example, you should see anAP of 99.61 and ANI of 99.92 (Figure 12). This is quite high due to the Polish with readsoption having been checked. In the next section, you will attempt to improve this assemblyby using Illumina reads to polish your assembly.Polish with ReadsPolish with Reads (beta) makes it possible to polish de novo assemblies or raw long error-pronereads for microbial-sized genomes.To start the tool, locate Polish with Reads (beta) in Long Read Support (beta) in the Toolbox orsearch and run using Launch.1. Select the assembly you created in the previous steps and click Next (Figure 13).

7TutorialFigure 11: Create Average Nucleotide Identity Comparison optionsFigure 12: The nucleotide identity comparison between reference and assemblyFigure 13: Select the contigs to polish2. in Polish with, specify the Illumina reads as input and click OK. Leave the other settingson default (Figure 14).3. Choose to Save in your De Novo Assemble Long Reads folder and check Create report.The tool will now run. Depending on your setup, this will take around 10 minutes. In thepolishing report, you can see the overview of the assembly after polishing. In this dataset, there are no significant changes to the number of contigs and assembly size althoughincorrect bases have been polished.4. (Optional) Repeat the steps listed in "Create a Whole Genome Alignment" using the polishedcontigs as input. Observe that the AP and ANI values have improved (Figure 15).

8TutorialFigure 14: Select the paired-end readsFigure 15: The nucleotide identity comparison between reference and the polished assemblyMap Long Reads to ReferenceMap Long Reads to Reference (beta) enables you to map long reads to contigs or a reference.This is useful for visualizing coverage and to better understanding your assembly.1. To start the tool, locate Map Long Reads to Reference (beta) in Long Read Support (beta)in the Toolbox or search and run using Launch.2. Select the raw S. Marcescens MinION reads (Figure 16) and click Next.Figure 16: Select the reads to map3. In References, select the S. marcescens reference genome and click OK (Figure 17). Thenclick Next.4. Leave the Mapping options as default (Figure 18) and click Next.5. Check the Create report option (Figure). Click Next and Save the result to your "De NovoAssemble Long Reads" folder. If you wish to work with your contigs in the Genomics FinishingModule, you should check Create stand-alone read mapping in the Output options.

9TutorialFigure 17: Select the reference to map to6. Save your Read Mapping in a new folder, for example titled "Map Long Reads to Reference".You should have two outputs (Mapping report and mapping). Open the Mapping report andobserve that 99% of the reads have mapped to the reference.Open the read mapping track to see that the chromosome and all plasmids have coverage.Correct Long Reads (Optional)Algorithms using the CLC read mapper in downstream analysis require a reduced error rate foroptimal performance. For use in these cases, it is possible to correct the raw reads. As theerror-correction is based on an all-vs-all mapping, it comes at the expense of an increased timeconsumption and reduced sensitivity for genetic variants as these may be considered sequencingerrors.It should therefore be stressed that Correct Long Reads (beta) should not be run as part of DeNovo Assemble Long Reads (beta) as polishing is default for this tool. Another helpful use forerror-correction is in finishing an assembly where smaller contigs, for example plasmids, were notfully resolved due to high error rate and low coverage. This is the case for plasmid CP011638.In the next steps, we will extract the reads from this plasmid in a subset. We will then runerror-correction on the subset and rerun De Novo Assemble Long Reads (beta). To start, you willneed stand-alone read mapping output from Map Long Reads to Reference (beta). The simplest

10TutorialFigure 18: Map Long Reads to Reference optionsway to do so without mapping the reads again, is to use the Convert from Tracks tool from theTrack Tools, Track Conversion section of the Toolbox.1. Start the tool Convert from Tracks from the Track Tools section of the Toolbox (Figure 19)or by using the Launch button and typing the name of the tool.Figure 19: Track Tools2. Select your readmapping (Figure 20) and click Next.Figure 20: Select the readmapping to convert

11Tutorial3. Save the output to your "Map Long Reads to Reference" location.4. Open the output and select "CP011638.1". Click Extract Subset as show in figure 21 andsave the output to a new location, for example titled "Plasmid CP011638.1".Figure 21: Extract reads from the incorrectly assembled plasmid5. Start the tool Extract Sequences from Classical Sequence Analysis General SequenceAnalysis from the Toolbox (Figure 22) or use Launch to search and run.Figure 22: Extract sequence in the Classical Sequence Analysis tools6. Select your extracted subset (Figure 23) and click Next.Figure 23: Select the subset of the readmapping7. In the parameters, check Extract to new sequence list (Figure 24) and click Next. Savethe output to your "Plasmid CP011638.1" location.

12TutorialFigure 24: Extract Sequences to a new sequence listReads mapping to this plasmid have now been extracted. You can now run Correct Long Reads(beta) on this subset.1. To start the tool, locate Correct Long Reads (beta) in Long Read Support (beta) in theToolbox or search and run using Launch.2. Select the raw plasmid reads (Figure 25) and click Next.Figure 25: Select the reads to error-correct3. In parameters under Execution mode you have the options of running Fast or Sensitive(Figure 26). Sensitive can correct additional reads at the cost of increased runtime. We willuse parameters as default in this example. Click Next.4. Check Create report and choose to Save the result and click Next. Save the output to your"Plasmid CP011638.1" location. The error correction will now run. Since this plasmid isonly covered by 23 reads, it will only take a few seconds to complete.5. You should have two outputs; a set of corrected reads and a Read correction report. Openthe report and compared the input and output statistics. Only 1 read was discarded so youhave enough coverage to assemble the plasmid.6. Run De Novo Assemble Long Reads (beta) on the uncorrected and corrected plasmid readsas described in the previous section. The uncorrected reads will assemble to a linear contig

13TutorialFigure 26: Correct Long Reads (beta) optionswith length 12.5kb. The corrected reads, however, assemble to one circular contig withlength 6.3kb which is what we expected from the reference.7. We can visualize that the large linear plasmid consists of 2 copies of the actual plasmid.To do so, open Create Whole Genome Dot Plot from Whole Genome Alignment Toolbox oruse Launch to search and run.8. Select the two plasmid assemblies (Figure 27) and click Next.Figure 27: Select the two assemblies9. Leave the settings as default (Figure 28) and click Next.10. Choose to Open or Save and click Finish. In the dotplot on Figure 29 you see two linesshowing that the smaller plasmid aligns twice to the larger assembly.To see the effect of error-correction, try mapping the raw and corrected Nanopore reads to yourassembly and opening them together in the Track Viewer.

14TutorialFigure 28: Create dotplot optionsFigure 29: Dotplot of the two assembliesFile New Track list.View the Track lists help page for additional how-to. In (Figure 30), the difference in errorsbetween the raw and corrected reads is clearly visible.SummaryUsing the Long Read Support (beta) plugin, we were able to quickly assemble a microbial genome.We were able to fully resolve 4 out of 5 plasmids. After polishing we received an Alignmentpercentage of 99.81 % and Average nucleotide identity of 99.93 %. We were able to resolve 5out of 5 plasmids by correcting raw reads from the unresolved plasmid and reassembling.This tutorial has demonstrated how to work with long error prone reads for creating high quality

15TutorialFigure 30: The difference in errors between raw and corrected MinION readsassemblies for microbial sized genomes.

TutorialBibliography[George et al., 2017] George, S., Pankhurst, L., Hubbard, A., Votintseva, A., Stoesser, N.,Sheppard, A. E., Mathers, A., Norris, R., Navickaite, I., Eaton, C., et al. (2017). Resolvingplasmid structures in enterobacteriaceae using the minion nanopore sequencer: assessmentof minion and minion/illumina hybrid data assembly approaches. Microbial genomics, 3(8).[Li, 2016] Li, H. (2016). Minimap and miniasm: fast mapping and de novo assembly for noisylong sequences. Bioinformatics, 32(14):2103--2110.[Li, 2018] Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics,34(18):3094--3100.[Vaser et al., 2017] Vaser, R., Sovi, I., Nagarajan, N., and ’iki, M. (2017). Fast and accurate denovo genome assembly from long uncorrected reads. Genome Research, 27(5):737--746.16

Tutorial 2 De Novo Assembly Using Long Reads and Short Read Polishing This tutorial is an introduction to working with the tools in the Long Read Support (beta) plugin. The Long Read Support (beta) plugin is a collection of tools developed for working with long, error-prone reads such as those produced by the single-molecule sequencing .

Related Documents:

Corroborative Testing of Renibacterium salmoninarum by ...

III. DNA Extraction using QIAGEN Kit A. MATERIALS AND REAGENTS QIAGEN DNeasy Kit (QIAGEN #29304) Ethanol (Absolute 97-100%) Proteinase K (Sigma P2308)-(also included in QIAGEN kit) Lysozyme Lysis Buffer Pipettor (100-200 µL and 0.5mL) Aerosol Barrier Tips 1.5mL centrifuge tubes

34 Views

2y ago

Bioinformatics Crash Course

Bioinformatics Crash Course Ian Misner Ph.D. Bioinformatics Coordinator UMD Bioinformatics Core . Bioinformatics!Core The Plan Monday – Introductions – Linux and Python Hands-on Training Tuesday – NGS Introduction – RNAseq with Sailfish (Dr. Steve Mount, CBCB) – RNAse

34 Views

2y ago

Research Paper Antioxydation And Cell Migration Genes Are ...

DNA extraction was performed using the QIAamp DNA mini kit (Qiagen) for cell lines and the QIAamp DNA micro kit (Qiagen) for tumors. RNA extraction was performed using RNeasy mini kit (Qiagen). The quantity and quality of the nucleic acids obtained were measured spectrophotomet

8 Views

2y ago

Bioinformatics Institute of India BIOINFORMATICS INSTITUTE ...

SECTION-A: Attempt any five questions. SECTION-B: Attempt any five questions. SECTION–A Short Answer type Questions: (60-80 Words) 5 5 25 Marks 1. What is the role of internet in bioinformatics? 2. How bioinformatics assist in drug designing? 3. Write a short note on Internet Protocol (IP). 4. What is Pattern mining? 5.

35 Views

3y ago

On Design and Implementation of a Bioinformatics Portal in ...

volumes of biological information in bioinformatics database. They also provide some bioinformatics tools for database search and data acquire. With the explosion of sequence information available to researchers, the challenge facing bioinformatics and computational biologists is to aid in biomedical researches and to invent efficient toolkits.

21 Views

3y ago

SPACE FOR BIOINFORMATICS. - JKU

tronics, Physics, Statistics, or Business Informatics. 8 LUM RAMABAJA Bachelor’s Student in Bioinformatics ‘Bioinformatics is a truly interesting field. The program has inspired me to apply what I have learned and help people by starting a company that diagnoses malaria.’ To The Point KRISTINA PREUER BSc MSc Graduate in Bioinformatics

39 Views

3y ago

Bioinformatics

Bioinformatics, Stellenbosch University Many bioinformatics tools and resources are available on the command-line interface These are often on the Linux platform (or other Unix-like platforms such as the Mac command line). They are essential for many bioinformatics and genomics applications.

38 Views

3y ago

ISSN 2347-2677 Advances and applications of Bioinformatics ...

Bioinformatics is an interdisciplinary area of the science composed of biology, mathematics and computer science. Bioinformatics is the application of information technology to manage biological data that helps in decoding plant genomes. The field of bioinformatics emerged as a tool to facilitate biological discoveries more than 10 years ago.

14 Views

2y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Tutorial - QIAGEN Bioinformatics

It looks like you're using an ad-blocker