Practical Guide To Interpreting RNA-seq Data

1y ago

11 Views

2 Downloads

6.44 MB

54 Pages

Last View : 27d ago

Last Download : 3m ago

Upload by : Tripp Mcmullen

Report this link

Download PDF

Transcription

Practical Guide to InterpretingRNA-seq DataSkyler Kuhn1,2Mayank Tandon1,21. CCR Collaborative Bioinformatics Resource (CCBR), Center for Cancer Research, NCI2. Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research

1OverviewI. Experimental DesignIV. Downstream AnalysisHypothesis-drivenPrincipal Components Analysis (PCA)Overview of Best PracticeDifferential ExpressionII. Quality-controlPre- and post- alignment QC metricsInterpretationIII. PipelineFastQ Files - Counts matrixReproducibilityPathway AnalysisV. Advanced VisualizationsGroup comparisonsAlternative Splicing EventsPathway Diagrams

I.Experimental Design

2I. Experimental Design: OverviewHypothesis-drivenAddresses a well thought-out quantifiable questionConsiderations:Library Construction: mRNA versus total RNASingle-end versus Paired-end SequencingSequencing Depth: quantifying gene-level or transcript-level expressionNumber of Replicates: statistical-power and ability drop a bad sampleReducing Batch Effects

3I. Experimental Design: Library ConstructionTotal RNA contains high-levels of ribosomal RNA (rRNA): 80%mRNApoly(A) selection standard profiling for gene expressionLow RIN may results in 3’ biasTotal RNArRNA depletionmRNA non-coding RNA species (lncRNA)Prokaryotic samples

4I. Experimental Design: Sequencing DepthmRNA: poly(A)-selectionRecommended Sequencing Depth: 10-20M paired-end reads (or 20-40M reads)RNA must be high quality (RIN 8)Total RNA: rRNA depletionRecommended Sequencing Depth: 25-60M paired-end reads (or 50-120M reads)RNA must be high quality (RIN 8)* Differential Isoform regulation or alternative splicing events: 100M paired-end reads

5I. Experimental Design: Number of ReplicatesRecommendedBiological Replicates Technical ReplicatesNumber of Replicates: 4Peace-of-mind: Ability drop a bad sample without compromising statistical powerBare MinimumBiological Replicates Technical ReplicatesNumber of Replicates: 3

6I. Experimental Design: Reducing Batch EffectsGroupBatchBatch*Treatment r1KO11Treatment r2KO21Treatment r3KO11Different Lab TechniciansTreatment r4KO21Different processing timesCntrl r1WT12Different Reagent LotsCntrl r2WT22Cntrl r3WT12Cntrl r4WT22Unwanted sources of technical variationDecrease batch effects by uniform processingProtocol-drivenSequencingLane effectSample Name* Confounded Groups and Batches!

II.Quality Control

7II. Quality-control: OverviewNo need to reinvent the wheel but there are a lot of wheels!Pre-alignment Quality-controlSequencing QualityContamination ScreeningPost-alignment Quality-controlAlignment QualityAggregation and InterpretationMultiQC ReportQC metric guidelines

8II. Quality-control: Pre-alignmentSequencing QualityFastQC: run twice on raw and trimmed dataContamination ScreeningFastQ ScreenFastQC rawFastQC trimmedKrakenBioBloomAdapter TrimmingContaminationScreening

II. Quality-control: Pre-alignmentFastQC (raw)Adapter TrimmingFastQC (trimmed)FastQCIdentify potential problems that can arise during sequencing or library prepRun on raw reads (pre-adapter removal) and trimmed reads (post-adapter removal)Summarizes:- Per base and per sequence quality scores- Per sequence GC content- Per sequence adapter content- Per sequence read lengths- Overrepresented sequences9

II. Quality-control: FastQC10

II. Quality-control: Pre-alignmentAdapter TrimmingContamination ScreenAlignmentFastQ ScreenAligns to Human, Mouse, Fungi, Bacteria, Viral referencesEasy to interpret and important QC stepKrakenTaxonomic composition of microbial contamination- Archaea- Bacteria- Plasmid- Viral11

FastQ ScreenContamination Screening12

Kraken KronaMicrobial Taxonomic Composition13

II. Quality-control: Post-alignmentAlignmentAlignment QualityQuantify CountsPreseqEstimates library complexityPicard RNAseqMetricsNumber of reads that align to coding, intronic, UTR, intergenic, ribosomal regionsNormalize gene coverage across a meta-gene body- Identify 5’ or 3’ biasRSeQCSuite of tools to assess various post-alignment quality- Calculate distribution of Insert Size- Junction Annotation (% Known, % Novel read spanning splice junctions)- BAM to BigWig (Visual Inspection with IGV)14

CollectRnaseqMetrics Alignment Summary15

Picard CollectRnaseqMetricsNormalized Gene Coverage3’ Bias16

17II. Quality-control: AggregationMultiQCHTML report that aggregates information across all samples- Plots, filtering, and highlightingHighly customizable with great documentation- Add text and embed custom figures- Create your own module to extend missing functionalitySupports over 73 commonly-used open source bioinformatics tools

QC Metric GuidelinesmRNAtotal RNARNA Type(s)CodingCoding non-coding 8 [low RIN 3’ bias] 8Paired-endPaired-end10-20M PE reads25-60M PE readsQ30 70%Q30 70% 70% 65% 7M PE reads (or 14M reads) 16.5M PE reads (or 33M reads) 5% 15%Picard RNAseqMetricsCoding 50%Coding 35%Picard RNAseqMetricsIntronic Intergenic 25%Intronic Intergenic 40%RINSingle-end vs Paired-endRecommended SequencingDepthFastQCPercent Aligned to ReferenceMillion Reads Aligned ReferencePercent Aligned to rRNA

III.Pipeline

III. Processing Pipeline18Conceptual DiagramAdapters are composed ofsynthetic sequences and shouldbe removed prior to alignmentCounting the number of readsthat align to particular feature ofinterest (genes, isoforms, etc)Adapter TrimmingQuantificationRaw dataFastQ filesAlignmentAdding biological context to yourdata, find where reads align tothe reference genomeDifferential ExpressionSummarizing differencesbetween two groups orconditions (KO vs. WT)

III. Processing Pipeline Practical ExampleCutadaptSTAR19RSEMFastQC: Pre- and post- trimmingCutadapt: Remove adaptersFastQ Screen: Run twice on different set of referencesSTAR: Splice-aware alignerRSEM: Generates gene and isoform countsMultiQC: Aggregates everything into an HTML reportFastQ files to raw counts matrix

20III. Processing Pipeline: ReproducibilityWorkflow management systemsSnakemake, NextflowPackage managementNo active management: rat’s nest of interdependencies prone to breakPython: virtual environmentsConda: Python, R, Scala, Java, C/C , FORTRANDocker or Singularity: Portability and high reproducibility

IV.Downstream Analysis

IV. Downstream Analysis Step 1: ThinkStep 2: AnalyzeStep 3: QC?Step 4: Nobel Prize!Answer BiologicalQuestionsAdapter TrimmingQuantificationRaw dataFastQ filesAlignmentDifferential Expression

21IV. Downstream AnalysisPrincipal Components Analysis (PCA)Data summarization, visualization, and QC toolDifferential ExpressionFind genes that are different between groups of interestPathway EnrichmentAnalyze for broader biological patterns

IV. Downstream Analysis: PCAPrincipal Components Analysis (PCA) Dimensionality reduction techniqueCaptures patterns of variance into singular valuesVisualizes global transcriptomic patterns22

IV. Downstream Analysis: PCAPCA can help drive biological insights.23

IV. Downstream Analysis: PCA or be used as a QC tool24

25IV. Downstream Analysis: Differential ExpressionGoal: Identify genes or transcripts that vary due tobiological effectsQuestion: Can’t I just use a t-test to do that?Answer: Sure. But data are noisy. bad ideaSo we apply normalization and/or employspecialized statistical tests.Law, C. W., et al. (2014). "voom: Precision weights unlock linear model analysis toolsfor RNA-seq read counts." Genome Biol 15(2): R29.

IV. Downstream Analysis: Differential Expression26Seyednasrollah, F., et al. (2015). "Comparison of software packages for detectingdifferential expression in RNA-seq studies." Brief Bioinform 16(1): 59-70.

IV. Downstream Analysis: Differential Expression27Seyednasrollah, F., et al. (2015). "Comparison of software packages for detectingdifferential expression in RNA-seq studies." Brief Bioinform 16(1): 59-70.

28IV. Downstream Analysis: Differential ExpressionPractical Rules of ThumbLimma, DESeq2, and EdgeR will work be very similarly in most cases- Consensus or intersection of the three is sometimes usedLimma works better with larger cohorts ( 7 or more samples per group)DESeq2 works better with small cohorts ( 3 or less per group)- May also be more sensitive for low depth dataEdgeR provides convenience functions for converting to various normalized values

IV. Downstream Analysis: Differential ExpressionOutput29

IV. Downstream Analysis: Pathway EnrichmentGene annotation and network databases capture biological meaningManual curation, text miningGene function and/or interactionsDozens of databases and hundreds of toolsDepends on how you want to look at gene-pathway relationships30

IV. Downstream Analysis: Pathway EnrichmentTypes of pathway analysisSimple enrichment test: Qualitative- Fisher’s Exact Test- Hypergeometric testEnrichment algorithms: Quantitative- GSEA (Broad Institute)Network AnalysisCommercial vs. open source31

33IV. Downstream Analysis: Pathway EnrichmentTypes of pathway analysisSimple enrichment test: Qualitative- Fisher’s Exact Test- Hypergeometric testEnrichment algorithms: Quantitative- GSEA (Broad Institute)Network AnalysisCommercial vs. open source

34IV. Downstream Analysis: Pathway EnrichmentTypes of pathway analysis

V.Visualizations

35V. Visualizations of RNA-Seq DataGroup comparisons of pathway enrichmentHeatmapsVisualizing Set OverlapDotplotsSashimi plotsAlternative Splicing

V. Visualizations: Group EnrichmentGroup comparison of pathway enrichment: Simple Enrichment Test36

V. Visualizations: Expression Heatmap37

V. Visualizations: Set Intersection38

V. Visualizations: Pathway enrichment39

V. Visualizations: Sashimi Plot40

41ConclusionsThink BEFORE you sequence!This is a three-way partnership: bench sequencing analysis- Everyone should agree on experimental design, platform, approachQC is extremely important!There is no need to reinvent the wheel but there are a lot of wheelsGarbage in, Garbage out!- Only some problems can be fixed bioinformaticallyThere will always be significant changes detectedInterpretation must be cautious and deliberate

THANKS!AcknowledgementsCCBR, NCBR, and GAU membersAny questions?

MiSeqCost-BenefitConsiderationsCaveats:Expected reads/sample based on maximumpossible yieldRun TimeMax OutputMax Reads PerRunLanesMaximum ReadNextSeqHiSeq 40004–55 hours 12–30 hours 1–3.5 daysNovaseq 13 - 44hours15 Gb120 Gb1500 Gb6000 Gb25 million400 million5 billion20 billion11842 300 bp2 150 bp2 150 bp2 x 250**Typical runs likely yield 80% of maxLengthDifferent platforms may have different turnaroundtimes depending on queue length and popularityCost from SF 623 1956 1007/lane 4382/laneMax Coverage2 million33 million52 million416 millionreadsreadsreadsreads 51.91 163.92 83.92 365.16Library Prep cost is not included here: 50-84 depending on type of kit(12 samples) per sample(12 ng-platforms.html?langsel /ch/

QC Metric Guidelines mRNA total RNA RNA Type(s) Coding Coding non-coding RIN 8 [low RIN 3' bias] 8 Single-end vs Paired-end Paired-end Paired-end Recommended Sequencing Depth 10-20M PE reads 25-60M PE reads FastQC Q30 70% Q30 70% Percent Aligned to Reference 70% 65% Million Reads Aligned Reference 7M PE reads (or 14M reads) 16.5M PE reads (or 33M reads)

Related Documents:

RNA & PROTEIN SYNTHESIS 6 FEBRUARY 2013

(Structure of RNA from Life Sciences for all, Grade 12, Figure 4.14, Page 193) Types of RNA RNA is manufactured by DNA. There are three types of RNA. The three types of RNA: 1. Messenger RNA (mRNA). It carries information about the amino acid sequence of a particular protein from the DNA in the nucleus to th

21 Views

2y ago

Describe the central dogma of molecular biology.

The process of protein synthesis can be divided into 2 stages: transcription and translation. 5 as a template to make 3 types of RNA: a) messengermessenger--RNA (mRNA)RNA (mRNA) b) ribosomalribosomal--RNA (rRNA)RNA (rRNA) c) transfertransfer--RNA (tRNA)RNA (tRNA) Objective 32 2)2) During During translationtranslation, the

27 Views

2y ago

10 - RNA Modifications

10 - RNA Modiﬁcations After the RNA molecule is produced by transcription (Part 9), the structure of the RNA is often modified prior to being translated into a protein. These modifications to the RNA molecule are called RNA modifications or posttranscriptional modifications. Most RNA modifications apply onl

24 Views

2y ago

Name Class Date 13.1 RNA

13.1 RNA RNA Synthesis In transcription, RNA polymerase separates the two DNA strands. RNA then uses one strand as a template to make a complementary strand of RNA. RNA contains the nucleotide uracil instead of the nucleotide thymine. Follow the direction

82 Views

2y ago

DNA and RNA - University of Colorado Boulder

DNA AND RNA Table 4.1: Some important types of RNA. Name Abbreviation Function Messenger RNA mRNA Carries the message from the DNA to the protein factory Ribosomal RNA rRNA Comprises part of the protein factory Transfer RNA tRNA Transfers the correct building block to the nascent protein Interference RNA

32 Views

2y ago

RNA Secondary Structure Prediction

Biological Functions of Nucleic Acids tRNA (transfer RNA, adaptor in translation) rRNA (ribosomal RNA, component of ribosome) snRNA (small nuclear RNA, component of splicesome) snoRNA (small nucleolar RNA, takes part in processing of rRNA) RNase P (ribozyme, processes tRNA) SRP RNA (

19 Views

2y ago

Metabolism of Nuclear RNA - nucleus.img.cas.cz

Coding and non-coding RNA zCoding RNAs (4% ) - transcriptome mRNAs : rapid turnover . RNA editing . RNA Pol II is an RNA Factory Capping of RNA pol II transcripts . Methods in enzymology, 2005). zScrambled control zPositive control (GAPDH) siRNA synthesis

15 Views

1y ago

12-3 RNA and Protein Synthesis - msmurraybiology.weebly.com

The Structure of RNA There are 3 main structural differences between RNA and DNA: 1. The sugar in RNA is ribose instead of deoxyribose. 2. RNA is single-stranded. 3. RNA contains uracil instead of thymine.

31 Views

11m ago

Recent Views

Stock Market Development and Economic Growth: Empirical Evidence from China

measures used to proxy for stock market size and the size of real economy. Most of the existing studies use stock market index as a proxy for measuring the growth and development of stock market in a country. We argue that stock market index may not be a good measure of stock market size when looking at its association with economic growth.

1y ago

263 Views

Lasso Technique Application In Stock Market Modelling: An Empirical .

This research tries to see the influence of G7 and ASEAN-4 stock market on Indonesian stock market by using LASSO model. Stock market estimation method had been conducted such as Stock Market Forecasting Using LASSO Linear Regression Model (Roy et al., 2015) and Mali et al., (2017) on Open Price Prediction of Stock Market Using Regression Analysis.

3m ago

18 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

181 Views

Factors Affecting Performance of Stock Market: Evidence from . - HRMARS

We used the data of Colombo Stock Exchange (CSE) for Sri Lankan stock market in this research which is the main stock exchange of Sri Lanka. The market capitalization of CSE is over 20 billion USD. Colombo stock exchange is the first south Asian region stock market and overall 52nd who obtain the membership of World Federation of Exchanges.

11m ago

103 Views

Stock Market Development in the Philippines: Past and Present

Philippine stock market. This paper may serve as a basis for further research on the stock market development in the country. This paper is organized as follows: Section 2 traces the origins of the stock market in the Philippines while section 3 outlines the reforms that have been implemented to strengthen the stock market.

1y ago

128 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

1.11.1. Where to Find Wall Street Training - Investing 101

investing and day trading, how to trade stock options, online free stock trading, market timing strategies, and mutual funds. But, first—learn what these terms mean. Play stock market games:Play stock market games: A stock simulation market game will train you to be comfortable with investing

2y ago

125 Views

Stock Price Prediction Using RNN and LSTM - JETIR

1. BASIC INTRODUCTION OF STOCK MARKET A stock market is a public market for trading of company stocks. Stock market prediction is the task to find the future price of a company stock. The price of a share depends on the number of people who want to buy or sell it. If there are more buyers, then prices will rise. If the seller has a number of .

1y ago

114 Views

Stock Market Wealth Effects - Harvard University

negative stock return and a subsequent decline in household spending and employment. We use a local labor market analysis to address this empirical challenge and provide quantitative evidence on the stock market consumption wealth e ect. Our empirical strategy combines regional heterogeneity in stock market wealth with aggregate movements in stock

1y ago

104 Views

Artificial Intelligence Approach for Stock Market - IJSER

The forecast of stock market helps investors to make investment decisions, via giving them strong insights about the behavior of stock market for avoiding investment risks. It was found that news has an influence on the stock price behavior [2]. The stock market is a constantly changing indicator of economic activity all over the world.

1y ago

109 Views

The Stock Market Game Student Activity Packet - Maryland Council on .

1. The Stock Market Game Kick Off! (3 mins) 2. Intro to Investing (4 mins) 3. Intro to Companies (3 mins) 4. Intro to Stocks (4 mins) 5. Building Your Portfolio (5 mins) 6. The Stock Market Game Trading Portfolio (6 mins) 7. The Stock Market Game Rules (6 mins) 8. Conducting Research (5 mins) 9. Entering Stock Trades (4 mins) 10. Assessing Risk .

1y ago

114 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

158 Views

The Stock Market Crash of 1929, Great Depression, Dust .

The Stock Market Crash of 1929 In 1929, the Stock Market Crashed!! The stock of a business represents the original money paid into or invested in the business by its founders. So the stock represents how much mone

2y ago

358 Views

Web Based Stock Forecasters - Winlab

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on a financial exchange. The successful prediction of a stock's future price could yield significant profit. The stock market is not an efficient market.

1y ago

102 Views

Practical Guide To Interpreting RNA-seq Data

It looks like you're using an ad-blocker