Stacks: An Analysis Tool Set For Population Genomics

3y ago

53 Views

2 Downloads

1.02 MB

17 Pages

Last View : 9d ago

Last Download : 3m ago

Upload by : River Barajas

Report this link

Download PDF

Transcription

Molecular Ecology (2013) 22, 3124–3140doi: 10.1111/mec.12354Stacks: an analysis tool set for population genomicsJULIAN CATCHEN,* PAUL A. HOHENLOHE,*† SUSAN BASSHAM,* ANGEL AMORES‡and W I L L I A M A . C R E S K O **Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA, †Biological Sciences, University ofIdaho, Moscow, ID 83844-3051, USA, ‡Institute of Neuroscience, University of Oregon, Eugene, OR 97403-1254, USAAbstractMassively parallel short-read sequencing technologies, coupled with powerful softwareplatforms, are enabling investigators to analyse tens of thousands of genetic markers.This wealth of data is rapidly expanding and allowing biological questions to beaddressed with unprecedented scope and precision. The sizes of the data sets are nowposing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data forstudies of populations of organisms. Stacks now produces core population genomicsummary statistics and SNP-by-SNP statistical tests. These statistics can be analysedacross a reference genome using a smoothed sliding window. Stacks also now providesseveral output formats for several commonly used downstream analysis packages. Theexpanded population genomics functions in Stacks will make it a useful tool toharness the newest generation of massively parallel genotyping data for ecological andevolutionary genetics.Keywords: GBS, genetics, next-generation sequencing, population genomics, RAD-seqReceived 19 November 2012; revision received 16 April 2013; accepted 16 April 2013IntroductionThe study of nearly complete genetic information innumerous individuals drawn from scores of populations is now rapidly becoming a reality (Storz 2005;Bonin 2008; Hohenlohe et al. 2010a, 2012a; Stapley et al.2010). New molecular genetic techniques (Mardis 2008),enabled by massively parallel short-read sequencingtechnologies coupled with powerful software, havebeen critical to advances in this nascent field of population genomics. Investigators have employed thesemethods to move from painstakingly developing dozens of microsatellite markers to rapidly producing tensof thousands of single nucleotide polymorphism (SNP)markers (Davey et al. 2011; McCormack et al. 2013).Several molecular approaches have been developed tofocus the large number of short reads provided bymodern sequencing platforms on specific, restrictionenzyme–anchored positions in the genome (e.g. CRoPS,Van Orsouw et al. 2007; RAD-seq, Baird et al. 2008; EtterCorrespondence: William A. Cresko, Fax: 541-346-2364;E-mail: wcresko@uoregon.eduet al. 2011b; GBS, Elshire et al. 2011; double-digestRAD-seq, Peterson et al. 2012; and 2bRAD, Wang et al.2012b). This family of reduced representation genotypingapproaches, generically called genotype-by-sequencing(GBS) or restriction site–associated DNA sequencing(RAD-seq; Davey et al. 2011), subsamples the genome athomologous locations to identify and type SNPs evenlythroughout the genome. Population genomics using GBSallows classic problems in ecological and evolutionarygenetics, such as identification of parentage and relatedness, migration and gene flow, population structure andphylogeography, and phylogenetic reconstruction, to beaddressed with unprecedented power and precision(Mitchell-Olds et al. 2008; Hohenlohe et al. 2010a; Stapleyet al. 2010). More importantly, population genomic studies allow the simultaneous identification of a genomewide average and outliers for any given statistic to helpidentify genomic regions contributing to local adaptationor even speciation (Lewontin & Krakauer 1973; MaynardSmith & Haigh 1974; Luikart et al. 2003; Beaumont &Balding 2004; Nielsen 2005; Storz 2005; Nielsen et al.2007; Foll & Gaggiotti 2008; Gaggiotti et al. 2009; Hohenlohe et al. 2010b, 2012b; Strasburg et al. 2012). 2013 John Wiley & Sons Ltd

S T A C K S F O R P O P U L A T I O N G E N O M I C S 3125The wealth of genetic data provided by massivelyparallel short-read sequencing brings serious challengesin data processing and analysis (Shendure & Ji 2008;Glenn 2011). Studies now commonly comprise billionsof raw sequences used to genotype tens of thousands tomillions of SNPs. The key to making such studies feasible is software that can efficiently assemble readstogether, identify alleles and genotypes, and track thosegenotypes in hundreds of individuals in scores of populations using a statistically rigorous framework (Lynch2009; Gompert et al. 2010; Hohenlohe et al. 2010b). Tohelp minimize the challenges of using GBS methodsfor genetic studies, we developed Stacks (http://creskolab.uoregon.edu/stacks/), a computational pipeline designed to work with any restriction enzyme–based GBS data. Stacks is computationally robust, efficient and flexible and can assemble short reads de novoor use data aligned to a reference genome. The Stackssoftware can handle data from thousands of individualsand incorporates a MySQL database and web front endfor efficient data visualization, management and modification. Stacks was initially designed for genetic mappingcrosses (Catchen et al. 2011), and we have added significant functionality for ecological and evolutionary genomic analyses. Here, we describe and evaluate these newfeatures of Stacks using RAD-seq data from Oregonthreespine stickleback (Gasterosteus aculeatus) populations. A complete manual for Stacks is ks manual.pdf), as are additional tutorials and other resources.Experimental space and the central concept ofStacksAnalysing GBS data requires several steps such asacquiring raw sequence data, filtering out low-qualityreads, assembling or aligning reads, and finally inferring SNPs and genotypes. Each step has its own associated challenges and uncertainties. These arise fromgenomic attributes such as the number of loci identified,the degree of repetitive sequences throughout thegenome, and the level of polymorphism and divergenceamong populations. These biological factors also interact with sequencing characteristics such as the qualityof DNA and degree of sample multiplexing, the totalnumber and length of reads, and the sequencing errorrate. Key decisions therefore need to be made at eachstep about such items as the required depth of coverageor allowable nucleotide distance between reads forassembly. Finally, because of biological and sequencingsampling variation, the use of statistical models willoften be necessary.We have built the Stacks software platform to bemodular and tunable to facilitate iterative exploration of 2013 John Wiley & Sons Ltdthe biological and sequencing parameter space for aparticular study and to easily acquire and incorporateadditional data. At the core of Stacks is the catalogue –a collection of all the loci and alleles identified in apopulation of individuals. In a mapping cross, the catalogue is simple and contains only loci found in the parents, enabling the identification of parental allelespresent in the progeny. In the more general case of aset of individuals from one or more populations, thecatalogue grows more complex and can often containmany more loci and segregating alleles. If a referencegenome is available, those loci can be ordered, allowingthem to be compared along the genome. Stacks uses arelational database and a web-based user interface. Thisinterface allows for data visualization and user-directedmodifications and corrections to the genetic hypotheses.Below we describe some of the major steps, decisionpoints, statistical considerations and ways to specify themajor parameters for Stacks.Major steps of a Stacks analysisThe raw input data to Stacks are sequenced DNA fragments from any restriction enzyme–based GBS protocol. These protocols provide reads that will beanchored to homologous locations in the genome,which then appear as well arranged ‘stacks’ whenvisualized (see Davey et al. 2011 for details). Stacks canhandle raw sequencing data in FASTA or FASTQformat to identify loci de novo and reads alignedagainst a reference genome in SAM (Li et al. 2009)format. Aligned reads may be gapped to allow forindels. Regardless of whether the data are assembledde novo, or aligned against a reference genome, manysubsequent steps in Stacks are shared.Stacks is a collection of several original C programsand Perl scripts. The components of Stacks can be runindividually by hand or using one of two providedwrapper programs that will execute the entire pipeline(denovo map.pl or ref map.pl).The pipeline is outlined in Fig. 1 and can bedescribed as follows:1 Raw sequence reads are demultiplexed and cleaned(process radtags).2 Data from each individual are grouped into loci, andpolymorphicnucleotidesitesareidentified(ustacks or pstacks for unaligned or aligneddata, respectively).3 Loci are grouped together across individuals and acatalogue is written (cstacks).4 Loci from each individual are matched against thecatalogue to determine the allelic state at each locusin each individual (sstacks).

3126 J . C A T C H E N E T A L .Fig. 1 The Stacks pipeline. Stacks proceeds in five major stages. First, reads are demultiplexed and cleaned by the process radtags program. The next three stages comprise the main Stacks pipeline: building loci (ustacks/pstacks), creating the catalogue of loci (cstacks) and matching against the catalogue (sstacks). In the fifth stage, either the populations orgenotypes program is executed, depending on the type of input data. The populations program tabulates the state of lociwithin and among populations, calculates population genetics statistics and exports to a number of additional, useful formats. Thegenotypes program is further described in Catchen et al. 2011.5 Allelic states are either converted into a set of mappablegenotypes (for a genetic map) using genotypes orsubjected to population genetic statistics via populations, with the results being written in one or severaluseful output files.As described previously in Catchen et al. (2011), aweb-based front end, backed by a MySQL database, isavailable to visualize the data. Both denovo map.pland ref map.pl will automatically populate a MySQLdatabase during execution.De novo stack formationStacks will, through the program ustacks, use a k-mersearch algorithm to merge alleles into loci. First, exactlymatching reads are formed into stacks using a hashingalgorithm. Stacks are subsequently decomposed intok-mers (subsequences of length k) that are comparedamong stacks to find matching alleles (see Catchen et al.2011 for more detail). In the previous version of Stacks,this process was controlled by two parameters. Thestack depth parameter (-m) controls the number of raw 2013 John Wiley & Sons Ltd

S T A C K S F O R P O P U L A T I O N G E N O M I C S 3127single locus, the locus would be broken down using ahierarchical clustering algorithm. We have replaced thisalgorithm with a more sensitive heuristic that is basedupon a minimum-spanning tree [See Appendix S1, 1.1,Supporting information for details of the algorithm].reads required to form a stack, and the mismatchparameter (-M) specifies the number of allowed nucleotide mismatches between two stacks to merge them intoa locus.We here add a third parameter. The maximum stacksallowed per locus can also now be modulated(- -max locus stacks). The expectation for nonrepetitive genomic regions is that a monomorphic locuswill produce a single stack because the two sequenceson the two homologous chromosomes are identical andthus indistinguishable. In contrast, a polymorphic locuswill produce two stacks representing alternative alleles(Fig. 2A). More complex cases abound, however, fromshort, sequencing error-based stacks in addition to thetrue alleles, to repetitive sequences, where hundreds ofloci in the genome may collapse to a single putativelocus. Stacks can be used to identify and remove theseconfounding cases. For example, the maximum stacksper locus parameter allows the user to limit the numberof stacks at any single locus (default 3). If the limit isexceeded, the locus is blacklisted, meaning it will not beavailable for insertion into, or matching against, thecatalogue. These confusing loci can be ignored for allsubsequent analyses. However, Stacks also contains adeleveraging algorithm in ustacks to help deconvolute some of these confounded loci. In previous versions of Stacks, if too many stacks were present at a(A)Reference-guided stack formationWhen a reference genome is available, Stacks relies on aset of aligned reads to assemble loci. Through the program pstacks, Stacks is able to use data from anyalignment program that can produce SAM or BAM output files and has been extensively tested with Bowtie(Langmead et al. 2009), BWA (Li & Durbin 2009) andGSNAP (Wu & Nacu 2010). The pstacks programwill read the CIGAR string (Li et al. 2009) from eachalignment in the SAM file to determine whether theread contained an insertion, deletion or soft-masking[see Appendix S1, 1.2, Supporting information for information on CIGAR strings]. When a deletion hasoccurred in the read relative to the reference, pstackswill insert Ns to regain phase with the reference, andtrim the end of the read to keep the length constant.Conversely, if an insertion has occurred in the read relative to the reference, pstacks will trim out theinserted bases and pad the end of the read with Ns.Both of these operations will allow bi-allelic loci CCCTGCGGAGGACCTGTTACCACC1131ACT033332320 A4CTG1111TCTC315 CTTFig. 2 The ustacks deleveraging algorithm. (A) The simples

De novo stack formation Stacks will, through the program ustacks, use a k-mer search algorithm to merge alleles into loci. First, exactly matching reads are formed into stacks using a hashing algorithm. Stacks are subsequently decomposed into k-mers (subsequences of length k) that are compared among stacks to ﬁnd matching alleles (see Catchen .

Related Documents:

Low voltage co-fired multilayer stacks, rings and chips for actuation

1.1 Piezo-stacks with on-stack-insulation (osi) 5 1.2 Piezo-stacks with in-stack-insulation (isi) 7 2. Aspects of actuator operation 11 2.1 Bulk-stacks or hollow ring-stacks 11 2.2 Mechanics of actuator operation 12 2.3 Mounting advises 14 2.4 Influences from ambient 17 2.5 Electrical operation conditions 18 3. Products, technical data 20

8 Views

5m ago

Seismic Isolation of Wine Barrel Stacks on Portable Steel ...

The viticulture industry stacks large volumes of wine barrel inventory on portable steel racks. Individual wine barrel stacks consist of two barrels at each level placed side by side on the rack below. These stacks

28 Views

2y ago

GAO-11-473 Air Quality: Information on Tall Smokestacks and Their ...

Of these stacks, 207 are 500 to 699 feet tall, 63 are 700 to 999 feet tall, and the remaining 14 are 1,000 feet tall or higher. About one-third of these stacks are concentrated in 5 states along the Ohio River Valley. While about half of tall stacks began operating more than 30 years ago, there has been an increase in the number of tall stacks that

28 Views

1y ago

Interrelation of Dynamic Response and Geometry of Short Steel ... - IJSER

Description of selected steel stacks/chimneys . 1. Type of stack circular self-supporting industrial steel stacks 2. Heights of stacks: 30 m ,35m,40m,45m ,50m ,55m,60 m ( short stacks) 3. Top diameter for each stack is taken as minimum h/30 as per provision in IS 6533 :1989 4. Variation in base diameter for each stack for fixed

8 Views

1y ago

˜e Adobe Illustrator® CHEAT SHEET - Shortgrass

e Adobe Illustrator CHEAT SHEET. Direct Selection Tool (A) Lasso Tool (Q) Type Tool (T) Rectangle Tool (M) Pencil Tool (N) Eraser Tool (Shi E) Scale Tool (S) Free Transform Tool (E) Perspective Grid Tool (Shi P) Gradient Tool (G) Blend Tool (W) Column Graph Tool (J) Slice Tool (Shi K) Zoom Tool (Z) Stroke Color

31 Views

1y ago

Track 'n Trade High Finance Manual - Gecko Software

6 Track 'n Trade High Finance Chapter 4: Charting Tools 65 Introduction 67 Crosshair Tool 67 Line Tool 69 Multi-Line Tool 7 Arc Tool 7 Day Offset Tool 77 Tool 80 Head & Shoulders Tool 8 Dart/Blip Tool 86 Wedge and Triangle Tool 90 Trend Fan Tool 9 Trend Channel Tool 96 Horizontal Channel Tool 98 N% Tool 00

22 Views

11m ago

Managing Switch Stacks - Cisco

5-3 Catalyst 3750 Switch Software Configuration Guide OL-8550-04 Chapter 5 Managing Switch Stacks Understanding Switch Stacks Note A switch stack is different from a switch cluste

35 Views

2y ago

Attendance Organisation Name Job Title

(Corporate Officer). Full day event, get a hamper and 10 via expenses for drinks. Andrew Tamplin is doing a morning session, breakout rooms including a live band, quiz, virtual Christmas choir, guided meditation/yoga, virtual pub, pets corner, creative room (cooking workshops, magic tricks, circus skills). Dec 11th.

54 Views

3y ago

Recent Views

Novell SUSE Linux Package Description and Support Level .

aspell-eo An Esperanto Dictionary for Aspell L2 aspell-es A Spanish Dictionary for ASpell L2 aspell-et An Estonian dictionary for aspell L2 aspell-fa A Persian dictionary for aspell L2 aspell-fi Finnish Dictionary Package L2 aspell-fo A Faroese Dictionary for ASpell L2 aspell-fr A French Dictionary for ASpell L2 aspell-ga An Irish Dictionary .

2y ago

348 Views

Dictionary of Aviation - THE AIRLINE PILOTS

Dictionary of Accounting 0 7475 6991 6 . Dictionary of Computing 0 7475 6622 4 Dictionary of Economics 0 7136 8203 5 Dictionary of Environment and Ecology 0 7475 7201 1 Dictionary of Food Science and Nutrition 0 7136 7784 8 Dictionary of Human Resources and Personnel Management 0 7136 8142 X

2y ago

162 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Oxford and the Dictionary - Oxford English Dictionary

What makes an Oxford Dictionary? People find dictionary-making fascinating. The 250th anniversary of Samuel Johnson’s Dictionary in 2005 was widely celebrated, and the recent BBC television series Balderdash and Piffle had a huge response to its call to viewers to help track down elusive word and phrase or

2y ago

210 Views

Cambridge Essential English Dictionary

These Dictionary Guide Worksheets are downloadable versions of the Guide to the Dictionary presented in the Cambridge Essential English Dictionary, Second Edition. The Guide is designed to help you develop skills in using a dictionary. The worksheets are grouped as five separate units, whi

2y ago

516 Views

The Interactive Arabic Dictionary: Another Collaboratively .

the Interactive Arabic Dictionary” [11], and “Conceptual Design of the Interactive Arabic Dictionary” [12], were the main studies used in HIAST to implement the Interactive dictionary. 2.1. Objectives IAD is a Monolingual dictionary (Arabic-Arabic), targeted to

2y ago

333 Views

Dictionary-guided Scene Text Recognition

A dictionary is an explicit language model, and the ben-eﬁts of a dictionary for scene text recognition are well es-tablished. In most previous works, a dictionary was used to ensure that the output sequence of characters is a legit-imate word from the dictionary, and it improved the accu-r

2y ago

313 Views

Going Online with a German Collocations Dictionary - unibas.ch

dictionary articles on two levels: a minimalistic view for the search and navigation stage and a more detailed view once a collocation is found. Keywords: online dictionary, collocations, dictionary design, learners' dictionary, German language . 1. Introduction Many dictionaries are available on the Web today. However, as yet there areno well-

7m ago

66 Views

A Fault Dictionary-Based Fault Diagnosis Approach for CMOS Analog .

Step 5: Fault dictionary construction: The fault dictionary is a collection of potential faulty and fault-free responses. The signatures obtained will be stored in the dictionary. This dictionary involves for each fault a correspondence between the faulty circuit responses and the defect sites.

4m ago

56 Views

On Entries for Neologisms in English-Chinese Learner's Dictionaries

A New English Chinese Dictionary of Journalism (2007) by Hu Zhiyong, An English -Chinese Dictionary of Neologisms (2009) by Li Mingyi, English-Chinese Neologism Dictionary (2013) by Wu Xuemei, A Dictionary of New Chinese Phrases in English (2015) by China Daily and A Chinese-English Dictionary of New Words and Expressions (2015) by Wu .

4m ago

63 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Ross E. Davies, George Mason University School of Law

Jan 15, 2012 · 4. Bryan A. Garner, Preface to the First Pocket Edition of BLACK‘S LAW DICTIONARY, reprinted in BLACK‘S LAW DICTIONARY vii (3d Pocket ed. 2006). Garner is the current editor-in-chief of Black‟s Law Dictionary and (even more surely than was Black in his own time) the most influential contemporary scho-lar of American legal language. 5.

2y ago

297 Views

Stacks: An Analysis Tool Set For Population Genomics

It looks like you're using an ad-blocker