Statistical Analysis Of RNA-Seq Data - University Of Lille

1y ago
3 Views
1 Downloads
3.91 MB
85 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Eli Jorgenson
Transcription

Experimental designExplorationNormalizationDifferential analysisStatistical analysis of RNA-Seq dataG. Marot (Univ. Lille Droit et Santé, Inria)Sources: J. Aubert et C. Hennequet-Antier (INRA)M.A. Dillies (Institut Pasteur Paris)6 avril 2017Multiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingIntroductionDifferential analysisComparison of treatments, states, conditions, .Example : ill vs healthy statistical analysis based on testsParticularities of NGS data :Very few individualsMany tests (one per variable)Count data (statistical distributions different from the onesused for continuous data from microarrays)

Experimental designExplorationNormalizationDifferential analysisMultiple testingStatistical testState the null and the alternative hypothesesH0 {the mean expression (or proportion) of the gene is identicalbetween the two conditions}H1 {the mean expression ((or proportion) of the gene is differentbetween the two conditions}Consider the statistical assumptions (e.g. independence) anddistributions (e.g. normal, negative binomial, . . .)Calculate the appropriate test statistic TDerive the distribution of the test statistic under the nullhypothesis from the assumptions.Select a significance level (α), a probability threshold belowwhich the null hypothesis will be rejected.Remark : H0 is always preferred. No sufficient proof no rejection.When we can not reject H0, this does not mean that H0 is true.

Experimental designExplorationNormalizationDifferential analysisMultiple testingDifferential analysisA gene is declared differentially expressed if the observed differencebetween two conditions is statistically significant, that is to sayhigher than some natural random variation.Key steps for statisticians :experimental designnormalizationdifferential analysismultiple testing

Experimental designExplorationPlan1Experimental design2Exploratory data analysis3Normalization4Differential analysis5Multiple testingNormalizationDifferential analysisMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingNot a recent idea !To consult a statistician after an experiment is finished is oftenmerely to ask him to conduct a post-mortem examination. He canperhaps say what the experiment died of (Ronald A. Fisher, Indianstatistical congress, 1938, vol. 4, p 17).While a good design does not guarantee a successful experiment, asuitably bad design guarantees a failed experiment (Kathleen Kerr,Inserm workshop 145, 2003)

Experimental designExplorationNormalizationDifferential analysisMultiple testingMake an experimental designContext of a RNA-seq experimentRule 0 : Share a common language in biology, bioinformatics andstatistics.Experimental designAll skills are needed to discussions right from project construction.Rule 1 : Well define the biological question, get together andcollect a priori knowledge (e.g. reference genome, splicing,. . .),Rule 2 : Anticipate, identify all factors of variation and adaptFisher’s principles (1935), collect metadata from experimentand sequencing,Rule 3 : Choose a priori tools/methods for bioinformatics andstatistical analyses,Rule 4 : Draw conclusions on results.

Experimental designExplorationNormalizationDifferential analysisMultiple testingExperimental designA good design is a list of experiments to conduct in order toanswer to the asked question which maximize collected informationand minimize experiments cost with respect to constraints.Rule 1 : Well define the biological question : make a choiceIdentify differentially expressed genes,Detect and estimate isoforms,Construct a de novo transcriptome.Rule 2 : adapt Fisher’s principles : randomization and blockingAVOID CONFUSION between the biological variability of interestand a biological or technical source of variation

Experimental designExplorationNormalizationDifferential analysisMultiple testingExperimental designBiological vs technical replicateBiological replicate : Repetition of the same experimental protocolbut independent data acquisition (several samples).Technical replicate : Same biological material but independentreplications of the technical steps (several extracts from the samesample).Sequencing technology does not eliminate biological variability.(Nature Biotechnology Correspondence, 2011)lane effect run effect library prep effect biological effect[Marioni et al., 2008],[Bullard et al., 2010]Include at least three biological replicates in your experiments !Technical replicates are not necessary.

Experimental designExplorationNormalizationDifferential analysisMultiple testingExperimental designAVOID CONFUSION between the biological variability of interestand a biological or technical source of variationProblem : Confusion between laneand conditionSolution : Distribute the conditionsevenly on both lanesProblem : Partial confusion between laneand conditionSolution : Distribute the conditions”evenly” on both lanes

Experimental designExplorationNormalizationDifferential analysisMultiple testingExperimental designFind genes that are differentially expressed between a normal skinand a damaged skin on rolwoundwoundwoundRNA extraction dateJuly 12th, 2016July 12th, 2016July 12th, 2016July 20th, 2016July 20th, 2016July 20th, 2016Confusion between skin status and RNA extraction date :comparing healthy and damaged skin is comparing RNAs extractedJuly 12th and 20th

Experimental designExplorationNormalizationDifferential analysisMultiple testingExperimental designFind genes that are differentially expressed between a normal skinand a damaged skin on rolwoundwoundwoundRNA extraction dateJuly 12th, 2016July 20th, 2016July 25th, 2016July 12th, 2016July 20th, 2016July 25th, 2016One solution : the day effect is evenly distributed across conditions.

Experimental designExplorationNormalizationDifferential analysisMultiple testingExperimental designFind genes that are differentially expressed between a normal skinand a damaged skin on rolwoundwoundwoundRNA extraction dateJuly 12th, 2016July 20th, 2016July 25th, 2016July 12th, 2016July 20th, 2016July 25th, 2016mousem1m2m3m1m2m3One solution : the day effect is evenly distributed across conditions.

Experimental designExplorationNormalizationDifferential analysisMultiple testingExperimental designWhy increasing the number of biological replicates ?To generalize to the population levelTo estimate with a higher degree of accuracy variation inindividual transcript [Hart et al., 2013]To improve detection of DE transcripts and control of falsepositive rate [Soneson and Delorenzi, 2013]To focus on detection of low mRNAs, inconsistent detectionof exons at low levels ( 5 reads) of coverage[McIntyre et al., 2011]

Experimental designExplorationNormalizationDifferential analysisMultiple testingMore biological replicates or increasing sequencing depth ?It depends ! [Haas et al., 2012], [Liu et al., 2014]DE transcript detection : ( ) biological replicatesConstruction and annotation of transcriptome : ( ) depth and( ) sampling conditionsTranscriptomic variants search : ( ) biological replicates and( ) depthSupportAn experimental design using multiplexing,Tools for experimental design decisions : Scotty[Busby et al., 2013], RNAseqPower [Hart et al., 2013],PROPER [Wu et al., 2015]And do not forget : budget also includes cost of biological dataacquisition, sequencing data backup, bioinformatics and statisticalanalysis.

Experimental designExplorationNormalizationDifferential analysisMultiple testingFor a good (nice) experiment design .Before the experimentAsk a precise and well defined biological questionList all possible biological confounding effects (sex, age, .)Collect samples while taking care of the distribution ofunwanted sources of variation across samplesInclude at least three biological replicates per condition.Technical replicates are not necessaryDistribute samples on lanes and flow cells .according to the comparisons to be madewithout introducing a confusion between technical effects andthe biological effects of interestapplying the same multiplexing rate on all samples

Experimental designExplorationNormalizationDifferential analysisMultiple testingEuropean Conference on Computational Biology 0How to Design a good RNA-Seq experiment in aninterdisciplinary context?Pôle Planification Expérimentale, PEPI IBIS1RNA-seq technology is a powerful tool for characterizing and quantifying transcriptome. Upstream careful experimental planning is necessary to pull the maximum ofrelevant information and to make the best use of theseexperiments.1,INRA, FranceBe aware of different types of biasWhy increasing the number of biologicalreplicates? To generalize to the population level To estimate to a higher degree of accuracy variation inindividual transcript (Hart, 2013) To improve detection of DE transcripts and control ofAn RNA-seqexperimentaldesignusing Fisher’sprinciplesfalse positive rate: TRUE with at least 3 (Sonenson2013, Robles 2012)More biological replicates or increasingsequencing depth?It depends! (Haas, 2012), (Liu, 2014) DE transcript detection: ( ) biological replicates Construction and annotation of transcriptome: ( )depth and ( ) sampling conditionsRule 1: Share a minimal commonlanguage Transcriptomic variants search: ( ) biologicalKeep in mind the influence of effects on results:lane run RNA library preparation biological(Marioni, 2008), (Bullard, 2010)RNA-seq experiment analysis: from A to Zreplicates and ( ) depthA solution: multiplexing.Decision tools available: Scotty (Busby, 2013),RNAseqPower (Hart, 2013)Some definitionsBiological and technical replicates:Rule 2: Well define the biologicalquestionSequencing depth: Average number of a given position in agenome or a transcriptome covered by reads in a sequencing runMultiplexing: Tag or bar coded with specific sequencesadded during library construction and that allow multiplesamples to be included in the same sequencing reaction(lane)Blocking: Isolating variation attributable to a nuisance variable (e.g. lane)From Alon, 2009 Choose scientific problems on feasibility and interest Order your objectives (primary and secondary) Ask yourself if RNA-seq is better than microarrayregarding the biological questionAdapted from Mutz, 2013ConclusionsMake a choiceRule 4: Make good choices Identify differentially expressed (DE) genes? Detect and estimate isoforms? Construct a de Novo transcriptome?How many reads?Rule 3: Anticipate difficulties witha well designed experimentPrepare a checklist with all the needed elements to becollected,Collect data and determine all factors of variation,3 Choose bioinformatics and statistical models,4 Draw conclusions on results.1 100M to detect 90% of the transcripts of 81% ofhuman genes (Toung, 2011) 20M reads of 75bp can detect transcripts of mediumand low abundance in chicken (Wand, 2011) 10M to cover by at least 10 reads 90% of all (humanand zebrafish) genes (Hart, 2013).2 Clarify the biological question All skills are needed to discussions right from projectconstruction Prefer biological replicates instead of technicalreplicates Use multiplexing Optimum compromise between replication number andsequencing depth depends on the question Wherever possible apply the three Fisher’s principles ofrandomization, replication and local control (blocking)And do not forget: budget also includes cost of biological data acquisition, sequencing data backup, bioinformatics and statistical analysis.Who are we?julie.aubert@agroparistech.fr, anne.delafoye@clermont.inra.fr, cyprien.guerin@jouy.inra.fr, iou@sophia.inra.fr, fabrice.legeai@rennes.inra.fr, delphine.labourdette@insa-toulouse.fr, nmarsaud@insa-toulouse.fr, brigitte.schaeffer@jouy.inra.frECCB14, 7 - 10 September 2014, Strasbourg, France(Pôle animation planification expérimentale PEPI IBIS, ECCB 2014, 7 - 10 Sep 2014)

Experimental designExplorationPlan1Experimental design2Exploratory data analysis3Normalization4Differential analysis5Multiple testingNormalizationDifferential analysisMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingSARtoolsSARTools : Statistical Analysis of RNA-Seq Tools[Varet et al., 2016]exports the results into easily readable tab-delimited filesgenerates a HTML report which displays all the figuresproduced, explains the statistical methods and gives theresults of the differential analysis.Exploratory data analysisDifferential analysis including normalization and multipletestingAvailable on R and Galaxy

Experimental designExplorationNormalizationDifferential analysisMultiple testingExploratory data analysisSample comparison for RNA-Seq [Schulze et al., 2012]Pearson’s correlation coefficientwidely used . . . . .but highly dependent on sequencing depth and the range ofexpression samples inherent to the sample.SERE : Simple Error Ratio Estimateratio of observed variation to what would be expected from anideal Poisson experimentinterpretation unambiguous regardless of the total read countor the range of expressionscore of 1 : faithful replicationscore of 0 : data duplicationscores 1 true global differences between RNA-Seq libraries

Experimental designExplorationNormalizationDifferential analysisMultiple testingExploratory data analysisscores between 0 and 1 underdispersion (variance smaller thanmean)scores greater than 1 : overdispersion adapted to biologicalreplicates

Experimental designExplorationNormalizationDifferential analysisMultiple testingSample comparison for RNA-Seqtotal read count dependencesource : [Schulze et al., 2012]sensitivity to contamination

Experimental designExplorationNormalizationDifferential analysisMultiple testingExploratory data analysisMultivariate exploratory data analysisMain goal : explore the structure of the dataset to betterunderstand the proximity between samples and detect possibleproblems. This a quality control stepTwo main toolsPrincipal Component Analysis (PCA) or MultiDimensionalScaling (MDS)ClusteringPre-requisiteTo apply these methods, make the data homoscedastic : thevariance must be independent of the intensity

Experimental designExplorationNormalizationDifferential analysisMultiple testingExploratory data analysisTransformations proposed :DESeq2 : VST (Variance Stabilizing Transformation) or rlog(Regularized Log Transformation)edgeR : transformation of the count data as moderatedlog-counts-per-millionIllustration : Without transformation : variance increases with mean

Experimental designExplorationNormalizationDifferential analysisExploratory data analysis - VST transformationMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingSARtoolsPRACTICEBuild a design/target file (.txt format) containing the namesof quantification files and biological information associated :target.txtBuild a .zip file containing the quantification files :MyCounts.zip

Experimental designExplorationPlan1Experimental design2Exploratory data analysis3Normalization4Differential analysis5Multiple testingNormalizationDifferential analysisMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingFold Change approach and ideal cut-off valuesFCi 500.00CondA27.001000.001100.001300.00xi .yi 050.00FC3.003.003.003.00FC does not take the variance of the samples into account.Problematic since variability in gene expression is partiallygene-specific.pvalue0.060.030.100.33

Experimental designExplorationNormalizationDifferential analysisMultiple testingNormalizationDefinitionNormalization is a process designed to identify and correcttechnical biases removing the least possible biological signal. Thisstep is technology and platform-dependant.Within-sample normalizationNormalization enabling comparisons of fragments (genes) from asame sample.No need in a differential analysis context.Between-sample normalizationNormalization enabling comparisons of fragments (genes) fromdifferent samples.

Experimental designExplorationNormalizationDifferential analysisMultiple testingSources of variabilityRead counts are proportional to expression level, gene length andsequencing depth (same RNAs in equal proportions).Within-sampleGene lengthSequence composition (GC content)Between-sampleDepth (total number of sequenced and mapped reads)Sampling bias in library construction ?Presence of majority fragmentsSequence composition due to PCR-amplification step inlibrary preparation [Pickrell et al., 2010], [Risso et al., 2011]

Experimental designExplorationNormalizationDifferential analysisMultiple testingComparison of normalization methodsA lot of different normalization methods.Some are part of models for DE, others are ’stand-alone’They do not rely on similar hypothesesBut all of them claim to remove technical bias associated withRNA-seq dataWhich one is the best ?[Dillies et al., 2013], on behalf of StatOmique GroupEvaluation of normalization methods for RNA-Seq differentialanalysis at the gene level

Experimental designExplorationNormalizationDifferential analysisMultiple testingComparison of normalization methodsFocus on methods which aim at making read countscomparable across samplesTwo main types1Methods that make read count distributions similar (if not equal)2Methods assuming that most genes are not differentially expressedNote that :These methods apply on raw (integer) count data, to RNA-seq data(metagenomics), for differential expression analysisOther more complex methods have been proposed recently[Risso et al., 2014]Library size : Number of reads that have been sequenced, mapped andcounted for a given sample (sum on columns on the count table)

Experimental designExplorationNormalizationDifferential analysisMultiple testingTotal Count normalization (TC) [Dudoit et al., 2010]Corrects for differences in the totalnumber of readsHypothesis Read count isproportional to gene expressionlevel and sequencing depth(same RNAs with sameconcentration)But Very sensitive to thepresence of high count genes

Experimental designExplorationNormalizationDifferential analysisMultiple testingTwo variants of Total Counts normalizationQ3 normalizationMedian normalizationThird quartile (Q3) is equal acrosssamplesMedian is equal across samples

Experimental designExplorationNormalizationDifferential analysisMultiple testingRPKM Normalization [Mortazavi et al., 2008]Reads Per Kilobase per Million mapped readsHypothesis read counts are proportional to gene expressionlevel, gene length and sequencing depth (same RNAs in equalproportions)Method divide gene count by total count (in million reads)and gene length (in kilobases)allows comparisons of gene expression levels within samplesVariations (not compared in [Dillies et al., 2013]) : FPKM, early-explained/

Experimental designExplorationNormalizationDifferential analysisMultiple testing(Full) Quantile Normalization (FQ)Hypothesis Read counts have similar distributions across samplesRaw dataNormalized data

Experimental designExplorationNormalizationDifferential analysisMultiple testing”Effective Library Size” [Robinson and Oshlack, 2010]MotivationDifferent biological conditions may express different RNArepertoires, associated with different quantities of total RNAsHypothesisMost genes are constant across biological conditions

Experimental designExplorationNormalizationDifferential analysisTrimmed Mean of M values (TMM)[Robinson and Oshlack, 2010]Log ratios shouldbe distributedaround 0Filter on transcriptswith null counts, onthe resp. 30% and5% more extremeMi and Aicalculate scalingfactors to normalizelibrary sizesx/NikkMi log 2( x 1 /N 1 ) and Ai 0.5 [log 2(xik1 /Nk1 ) log 2(xik2 /Nk2 )]ik2k2Multiple testing

Experimental designExplorationNormalizationDifferential analysisDESeq Normalization [Anders and Huber, 2010]Normalization factor (for readcounts) computed upon geneswith a non zero read count in atleast one conditionsˆj mediani (xij)(πvn 1 xiv )1/nxij number of reads in sample jassigned to gene i,n number of samples in theexperimentMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingNormalization summaryMethods that compute a normalization factor per sampleNotationxij : number of reads for gene i in sample jNj : number of reads in sample j (library size of sample j)n : total number of samplesŝj : normalization factor for sample jx̂ij : normalized read countfˆj : scaling factor computed by TMMNj0 : library size of sample j normalized with TMM

Experimental designExplorationNormalizationDifferential analysisNormalization summaryMethods that compute a normalization factor per sampleTotal countŝj 1nNPjlTMMNj0 Nj fˆj ,NlQ3ŝj 1nl1nN0PjjNj0DESeqŝj mediani (Qnxij1/nν 1 xiν )Q3P jl Q3lMedianmedjŝj 1 P mednŝj lComputing normalized readsxx̂ij ŝijjMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingWhich method should I use ? [Dillies et al., 2013]In most casesAll methods provide comparable resultsAnyway .Clear differences appear in the presence of high count genes orwhen the expressed RNA repertoire varies notably across samples

Experimental designExplorationNormalizationDifferential 0.2Power0.40.60.81.00.00False positive rate0.05 0.10 0.15 0.200.25Which method should I use ? [Dillies et al., 2013]Multiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingConclusionsHypothesis : the majority of genes is invariant between twosamples.Differences between methods when presence of majoritysequences, very different library depths.TMM and DESeq : performant and robust methods in a DEanalysis context on the gene scale.Normalization is necessary and not trivial.Detection of differential expression in RNA-seq data isinherently biased (more power to detect DE of longer genes)Do not normalise by gene length in a context of differentialanalysis.

Experimental designExplorationPlan1Experimental design2Exploratory data analysis3Normalization4Differential analysis5Multiple testingNormalizationDifferential analysisMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingDifferential analysisAim : Detect differentially expressed genes between two conditionsDiscrete quantitative dataFew replicatesOverdispersion problemChallenge : method which takes into account overdispersion and asmall number of replicatesProposed methods : edgeR, DESeq for the most used andknown [Anders et al., 2013]An abundant litteratureComparison of methods : [Pachter, 2011],[Kvam and Liu, 2012], [Soneson and Delorenzi, 2013],[Rapaport et al., 2013]

Experimental designExplorationNormalizationDifferential analysisMultiple testingHypothesis testingDefinitionA general method for testing a claim or hypothesis about aparameter in a population, using data measured in a sample.Four ingredients1Experimental data x1 , x2 , . . . , xn2Statistical model : assumptions about the independence ordistributions of the observations with parameter θ3Hypothesis to test : assumption about one parameter of thedistribution4Region of rejection (or critical region) : the set of values ofthe test statistic T for which the null hypothesis H0 isrejected. T f (X1 , X2 , . . . , Xn ) is a function whichsummarizes the data without any loss of information about θ.The distribution of T under H0 is known.

Experimental designExplorationNormalizationDifferential analysisMultiple testingCritical region and p-valuep-value p(t)For a realisation t of the T test statistic p(t) is the probability(calculating under H0 ) of obtaining a test statistic at least asextreme as the one that was actually observed.In bilateral case :p(t) PH0 { T t }The p-value measures the agreement between H0 and obtainedresult.Link with the critical regionPH0 {T R} P{p(t) α}with α the significance level.

Experimental designExplorationNormalizationDifferential analysisMultiple testingDifferential analysis gene-by-gene- with replicatesFor each gene iIs there a significant difference in expression between condition Aand B ?Statistical model (definition and parameter estimation) Generalized linear frameworkHypothesis to test : H0i Equality of relative abundance ofgene i in condition A and B vs H1i non-equalityCritical region - Wald Test or Likelihood Ratio TestThe Poisson distribution to model countsDiscrete probability distribution used to describe the numberof occurences of rare events during a given time intervalProperty : Mean Variance

Experimental designExplorationNormalizationMean-Variance RelationshipFrom D. Robinson and D. McCarthyDifferential analysisMultiple testing

Experimental designExplorationNormalizationDifferential analysisMultiple testingOverdispersion in RNA-seq dataCounts from biological replicates tend to have variance exceedingthe mean ( overdispersion). Poisson describes only technicalvariation.What causes this overdispersion ?Correlated gene countsClustering of subjectsWithin-group heterogeneityWithin-group variation in transcription levelsDifferent types of noise present.In case of overdispersion, increase of the type I error rate(probability to declare incorrectly a gene DE).

Experimental designExplorationNormalizationDifferential analysisMultiple testingTypes of noise in data1Shot noise : unavoidable noise inherent in counting process(dominant for weakly expressed genes) well-modeled byPoisson distribution2Technical noise : from sample preparation and sequencing,hopefully negligible3Biological noise : unaccounted for differences between samples(dominant for strongly expressed genes)Need of an extra-parameter to model the varianceThe Negative Binomial ModelLet be Xijk the count for replicate j in condition k from gene iX ijk follows a Negative Binomial (µijk Mj λik , σijk ), withMj library size and λik relative abundance of gene i.σijk µijk (1 φi µijk )

Experimental designExplorationNormalizationDifferential analysisMultiple testingThe Negative Binomial distributionBernoulli trialRandom experiment with exactly two possible outcomes : success(S) or failure (F)p : probability of successNegative Binomial distributionRepeat Bernoulli trials with probability p of success. NB describesthe distribution of the number of failures k before getting nsuccessesFrom Poisson to NBA Negative Binomial distribution is a mixture of Poisson laws withvariable parameter. It is a robust alternative to Poisson in thecase of over-dispersed data (the variance is higher than themean)

Experimental designExplorationNormalizationDifferential analysisMultiple testingNegative Binomial ModelsA supplementary dispersion parameter φ to model the variancePoisson vs Negative Binomial models

Experimental designExplorationNormalizationDifferential analysisMultiple testingAvailable testsModels of count dataData transformation and gaussian-based model : limma voomPoisson : TSPMNegative Binomial : edgeR, DESeq(2), NBPSeq, baySeq,ShrinkSeq, .Statistical approachesFrequentist Approach : edgeR, DESeq(2), NBPSeq, TSPM, .Bayesian Approach : baySeq, ShrinkSeq, EBSeq, .Non-parametric approach : SAMSeq, NOISeq, .

Experimental designExplorationNormalizationDifferential analysisMultiple testingComparison of two conditions[Soneson and Delorenzi, 2013]A comparison of methods for differential analysis of RNA-Seq data[Soneson and Delorenzi, 2013]11 statistical tests included in the studyR packagesinput data are raw counts (gene-level analysis)TMM or DESeq normalization

Experimental designExplorationNormalizationDifferential analysisMultiple testingMain resultsWith only two biological replicates, all the methods showlow performances. They either lack power or poorly controlthe false positive rate.No method outperforms the others in all circumstances : themethod should be chosen according to the datasetHow to choose ?Number of replicates of the experimentPresence / absence of outliersConstant / variable within-group dispersionBalanced / unbalanced differential expression(results are more accurate and less variable between methodsif DE genes are regulated in both directions)Simple / complex experiment design

Experimental designExplorationNormalizationDifferential analysisMultiple testingedgeR and DESeq(2)DESeq2 et edgeR : similarities . . .Easy to use and well documented R packagesA 3-step analysis process : normalization, dispersionestimation, statistical testNegative Binomial distribution of counts and GeneralizedLinear Models (GLM) : allows analysis of simple and complexdesigns. . . and differencesoutlier detection and processinglow counts filteringdispersion estimationIn both cases, the version matters

Experimental designExplorationNormalizationDifferential analysisMultiple testingEstimating the dispersion : the key questionCoefficient of variation (CV)Normalized measure of dispersion, ratio of the standard deviationto the meanIn the negative binomial model22CV 2 CVtechnique CVbiologique1 φiµijk(1)(2)ConsequenceTechnical variability is the main source of variability in low counts,whereas biological variability is dominant in high counts

Experimental designExplorationNormalizationDifferential analysisMultiple testingEstimating the dispersion : the key q

SARTools : Statistical Analysis of RNA-Seq Tools [Varet et al., 2016] exports the results into easily readable tab-delimited les generates a HTML report which displays all the gures produced, explains the statistical methods and gives the results of the di erential analysis. Exploratory data analysis

Related Documents:

TruSeq Stranded mRNA Library Prep Kit Cost-efficient, scalable library preparation for mRNA-Seq, with precise measurement of strand orientation. For standard RNA samples.* TruSeq RNA Exome: Focus the discovery power of RNA-Seq on difficult RNA samples, for a high-throughput and cost e

Back to the Basics: Agilent’s Five Part 101 eSeminar Series Event Date & Time Speaker Topics RNA-Seq 101 Wed, Oct 9 1 pm ET Jean Jasinski, PhD Field Application Scientist How Does RNA-Seq Differ from DNA-Seq? What is Strand Specific RNA-Seq and How Does it Work? What is the Value of Targeted vs. Whole Transcriptome RNASeq?

TruSeq stranded mRNA Single barcode 2 NovaSeq lanes (S1) with 24 samples / lane 12 HiSeq lanes with 4 samples / lane Identical library on both machines. RNA-seq Workflow Runtime (min) GB RAM Salmon Upper quartile 80 4 800 16. RNA-seq Workflow. RNA-seq Workflo

(Structure of RNA from Life Sciences for all, Grade 12, Figure 4.14, Page 193) Types of RNA RNA is manufactured by DNA. There are three types of RNA. The three types of RNA: 1. Messenger RNA (mRNA). It carries information about the amino acid sequence of a particular protein from the DNA in the nucleus to th

The process of protein synthesis can be divided into 2 stages: transcription and translation. 5 as a template to make 3 types of RNA: a) messengermessenger--RNA (mRNA)RNA (mRNA) b) ribosomalribosomal--RNA (rRNA)RNA (rRNA) c) transfertransfer--RNA (tRNA)RNA (tRNA) Objective 32 2)2) During During translationtranslation, the

10 - RNA Modifications After the RNA molecule is produced by transcription (Part 9), the structure of the RNA is often modified prior to being translated into a protein. These modifications to the RNA molecule are called RNA modifications or posttranscriptional modifications. Most RNA modifications apply onl

13.1 RNA RNA Synthesis In transcription, RNA polymerase separates the two DNA strands. RNA then uses one strand as a template to make a complementary strand of RNA. RNA contains the nucleotide uracil instead of the nucleotide thymine. Follow the direction

DNA AND RNA Table 4.1: Some important types of RNA. Name Abbreviation Function Messenger RNA mRNA Carries the message from the DNA to the protein factory Ribosomal RNA rRNA Comprises part of the protein factory Transfer RNA tRNA Transfers the correct building block to the nascent protein Interference RNA