Research Paper Effect Of Population Size And Mutation Rate .

3y ago
1.57 MB
14 Pages
Last View : 9d ago
Last Download : 4m ago
Upload by : Mika Lloyd

Int. J. Biol. Sci. 2017, Vol. 13IvyspringInternational Publisher1138International Journal of Biological SciencesResearch Paper2017; 13(9): 1138-1151. doi: 10.7150/ijbs.19436Effect of Population Size and Mutation Rate on theEvolution of RNA Sequences on an Adaptive LandscapeDetermined by RNA FoldingAli R. Vahdati1, 2, Kathleen Sprouffske1, 2 and Andreas Wagner1, 2, 3 1.2.3.Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland;The Swiss Institute of Bioinformatics, Lausanne, Switzerland;The Santa Fe Institute, Santa Fe, USA. Corresponding author: Ivyspring International Publisher. This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY-NC) /4.0/). See for full terms and conditions.Received: 2017.02.01; Accepted: 2017.07.05; Published: 2017.09.05AbstractThe dynamics of populations evolving on an adaptive landscape depends on multiple factors,including the structure of the landscape, the rate of mutations, and effective population size.Existing theoretical work often makes ad hoc and simplifying assumptions about landscapestructure, whereas experimental work can vary important parameters only to a limited extent.We here overcome some of these limitations by simulating the adaptive evolution of RNAmolecules, whose fitness is determined by the thermodynamics of RNA secondary structurefolding. We study the influence of mutation rates and population sizes on final mean populationfitness, on the substitution rates of mutations, and on population diversity. We show thatevolutionary dynamics cannot be understood as a function of mutation rate µ, population size N,or population mutation rate Nµ alone. For example, at a given mutation rate, clonal interferenceprevents the fixation of beneficial mutations as population size increases, but larger populationsstill arrive at a higher mean fitness. In addition, at the highest population mutation rates we study,mean final fitness increases with population size, because small populations are driven to lowfitness by the relatively higher incidence of mutations they experience. Our observations showthat mutation rate and population size can interact in complex ways to influence the adaptivedynamics of a population on a biophysically motivated fitness landscape.Key words: population size; rate of adaptation; fitness landscape; RNA secondary structure.IntroductionPerhaps the most fundamental process inDarwinian evolution is a population's exploration ofan adaptive landscape [1] by mutation and selection.As a population scales ever higher peaks in such alandscape, its mean fitness increases. (A fitness peakrefers to one or more sequences with higher fitnessthan all their neighbors.) Many factors influence thisprocess. Among them is the structure of the landscapeitself, including its number of peaks, environmentalchanges that might influence this structure, thepresence and incidence of recombination, the rate ofDNA mutations, the kinds of genetic changes thatsuch mutations cause, and population size [2–9]. Tounderstand these factors and how they interact toaffect adaptive evolution is not just of academicinterest. It may also help predict the outcome ofadaptive evolution, for example in pathogens andtheir arms races with human and non-human hosts[10–14].Unfortunately, the factors influencing adaptiveevolution interact in complex ways. Here we focus ontwo such factors, mutations and their rate, as well asthe effective size of a population Ne [15, 16]. We studyhow these factors interact in the adaptive evolution ofRNA molecules subject to mutation and selection onan unchanging fitness landscape.

Int. J. Biol. Sci. 2017, Vol. 13Both separately and jointly, the two factorsinfluence adaptive evolution in complex ways.Consider population size. On the one hand, adaptiveevolution may be more rapid in large populations.First, larger populations produce more mutantindividuals per generation, which helps explore moregenotypes and find optimal genotypes faster thansmaller populations. Second, natural selection is moreeffective in larger populations [17]. Specifically, aseffective population size Ne increases, naturalselection becomes more effective in fixing beneficialmutations and removing deleterious mutations. Inother words, the substitution rate of beneficialmutations is an increasing function of Ne, and thesubstitution rate of deleterious mutations a decreasingfunction of Ne [18, 19]. Third, if mutation rates andpopulation sizes are large enough, then someindividuals in large populations will experiencedouble mutations that can help them cross fitnessvalleys and explore genotypes that would otherwisebe inaccessible [11], a phenomenon also known asstochastic tunneling [20–24].On the other hand, there are also reasons whyadaptive evolution may be more rapid in smallerpopulations. First, such populations experience littleor no clonal interference, a phenomenon that can slowdown the adaptation rate in large and polymorphicpopulations [11, 25]. In clonal interference, multiplebeneficial mutations coexist in a population at thesame time. In the absence of recombination,individuals harboring different beneficial mutationscompete with each other, which can slow down thefixation of beneficial mutations and thus adaptiveevolution. Second, small populations experiencestronger genetic drift and the stochastic changes inallele frequencies that can help a population cross afitness valley [7, 8]. A different perspective on thesame phenomenon is provided by considering theadaptive peaks in a multi-peaked adaptive landscape.Because only differences in fitness effects that aregreater than the reciprocal of the population size (1/Ne) are visible to selection [17], some fitness peaksseparated by a valley will merge as population sizedecreases, thus reducing the number of peaks in thelandscape [8, 11, 13]. This will decrease the likelihoodthat a population becomes trapped on a local peak,and increase its chances to find the landscape’s globalfitness peak.Further complications ensue if one considers theinfluence of mutations and the distribution of theirfitness effects [26, 27]. These effects fall into threebroad categories, deleterious, neutral, and beneficial.While the fate of neutral mutations is independent ofpopulation size [17, 19], this does no longer hold forbeneficial or deleterious mutations. To be sure,1139strongly deleterious (lethal) mutations get eliminatedrapidly, and strongly beneficial mutations sweep tofixation rapidly, but the fate of weakly deleterious andweakly beneficial mutations can depend on stochasticevents caused by genetic drift and thus on populationsize. For example, weakly deleterious mutations canpersist for substantial amounts of time, or evenbecome fixed in small populations.As a result of these interactions betweenmutation rate and population size, the substitutionrate of mutations is expected to show a U-shapedrelationship with Ne [18]. That is, at small Ne, manyslightly deleterious mutations become fixed. At largeNe, many slightly beneficial mutations become fixed,because positive selection is strong. At intermediateNe, fewer mutations become fixed. The exact form ofthis relationship, however, depends strongly on thedistribution of mutational fitness effects [26–28].Existing work to elucidate the role of populationsize and mutation rate on adaptive dynamics falls intotwo categories. The first comprises computational andtheoretical studies to understand these dynamics [5, 6,9, 13, 29]. Because they do not use data from empiricaladaptive landscapes, such studies usually make adhoc assumptions about the structure of a fitnesslandscapes, the fitness effects of individual mutations,non-additive (epistatic) interactions of mutations [30,31], and so on. Violations of these assumptions mayaffect the evolutionary dynamics [18]. For example,the effective population size Ne and the substitutionrate of beneficial mutations are expected to show apositive association if beneficial mutations are rare[18]. However, the incidence of beneficial mutationsmay change when the environment changes, or whilea population explores a fitness landscape. Suchchange can affect the substitution rate of beneficialmutations, and thus also the rate of adaptiveevolution.Other studies use experimental approaches.Unlike theoretical studies, they examine fitnesslandscapes of realistic complexity. However, becausesuch landscapes are very large and may involveastronomically many genotypes, we usually havevery limited knowledge about the structure of theselandscapes and about a population's evolutionarytrajectories on them [32, 33]. Moreover, experimentalstudies are subject to limited replication, and can thusvary mutation rates, population sizes, and otherrelevant parameters only to a limited extent.Here we overcome some of these limitations bysimulating adaptive evolution on a biophysicallymotivated adaptive landscape that does not requiread hoc assumptions about landscape structure. It is alandscape whose structure is determined by thethermodynamics of RNA folding [34–36]. RNA

Int. J. Biol. Sci. 2017, Vol. 131140molecules fold into secondary structures by internalpairing of complementary base pairs (G-C, A-U).Driven by thermal motions, an RNA molecule canfold and re-fold incessantly and thus adopt aspectrum of different secondary structures that differin their free energy. The structure in which a moleculespends most of its time is the minimum free energy(MFE) structure [35, 37]. In our simulations, we usethe fraction of time a molecule spends in a given fold the stability of this fold - as a measure of fitness. Thisstability may itself be subject to selection [38]. Apotential example is the stability of yeast mRNAsecondary structures, which increases with geneexpression levels [39]. For reasons of tractability, andconsidering existing precedents in modeling RNAevolution [34, 36, 40, 41], we assume that selection actsonly on the stability of a single structure, but note thatin nature a balance between multiple secondarystructures may be important [42–44].Aside from using a biophysically motivatedadaptive landscape, our simulation model also has theadvantage that it does not require us to make ad hocassumptions about fitness effects of mutations orabout epistatic interactions of mutations, mics of folding. And with a simulationmodel, we can explore a wider range of mutationrates and population sizes than in experimental work.Although one might naively assume that evolutionarydynamics can be understood as a function of mutationrate µ or population mutation rate (Nµ) alone, ourobservations show otherwise.ResultsShort RNA sequences folding into anysecondary structure are highly connectedOur evolution simulations build on two differentkinds of RNA sequences. The first comprise all ofthose 410 1,048,576 ten-nucleotide-long sequencesthat fold into some secondary structure in theirminimum free energy (MFE) state. Before studyingthe evolutionary dynamics of these molecules, we firstcharacterized how they are organized in RNAgenotype space. To this end, we first determined byexhaustive enumeration that there are 39,410sequences (3.76% of sequence space) with some MFEsecondary structure, and that they form nine distinctsecondary structures. Each of these structures has asingle stem-loop but with different nucleotidesinvolved in the stem (Table 1). Although thesesequences comprise a small fraction of the wholegenotype space, they are highly accessible from oneanother through single mutations. This can be shownby constructing a genotype network, i.e., a graphwhose nodes are sequences that form some secondarystructure (regardless of the identity of that structure),and whose edges connect two sequences that differ bya single point mutation. This graph has five connectedcomponents. (A component is a set of nodes that areaccessible from each other through a path of one ormore edges.) However, one of these componentscontains the vast majority (99.24%, 39,109) ofsequences (Figure 1).One can subdivide the nodes (sequences) in thisgraph into subsets of sequences associated with eachone of the nine MFE secondary structures. Each suchsubset itself forms a genotype network with multipleconnected components. Specifically, depending on thestructure, these networks comprise between 943 to8,513 nodes, and have between 3 to 21 connectedcomponents each. All of them are positivelyassortative, with assortativity values between 0.13and 0.82 (see Methods), meaning that highlyconnected sequences tend to be connected to otherhighly connected sequences. It takes 5 to 10 mutationsto travel between the most distant two nodes whilestaying within the largest component of each network(see column "Diameter" in Table 1).Table 1. Properties of genotype networks of RNA molecules of length 10 that fold into the nine possible secondary GC )).Min-Max time in MFE 0.980.40-0.640.38-0.980.39-0.640.39-0.95Columns from left to right: 'ID': an identifier for the secondary structure; 'Vertices': number of sequences folding into the structure; 'GC vertices': number of edges in the giantcomponent of the genotype network formed by the sequences; 'Components': number of connected components within each network (a connected component is a set ofsequences which are all accessible from each other through a series of single point mutations that preserve the structure); 'Assortativity': assortativity coefficient of the largestconnected component. The assortativity coefficient indicates to what extent sequences have neighbors with degrees (numbers of neighbors) similar to themselves [82];'Diameter': the diameter of the largest connected component. The diameter of a network is the largest minimal distance between any pair of nodes in a connected component;'Structure': MFE structure of the sequences in the network; 'Min-Max time in MFE structure': range of the fraction of times that sequences folding into the MFE structurespend in this structure. More time spent in a structure corresponds to higher fitness in our model.

Int. J. Biol. Sci. 2017, Vol. 131141Figure 1. The genotype network of RNA sequences of length 10. Each circle (node) corresponds to a sequence. Two nodes are connected if theydiffer by a single point mutation. Nodes with the same color have the same minimum free energy secondary structure (Legend). The inset enlarges a part of the largestcomponent. Nodes are clustered based on their number of shared connections (based on ForceAtlas2 embedding in Gephi [73]. For clarity of representation, ourdisplay allows for overlapping nodes, such that the actual number of nodes may be more than the number of nodes that are visible. The graph in the figure illustratesthe intertwined organization of different genotype networks and genotype sets. Because of its large number of nodes (39,401) and edges (311,000), not all nodes andedges are visible, and accurate accounting of component numbers is thus not possible.Our simulations of evolving populations use thefraction of time that sequences spend in their MFEstructure as a measure of fitness. This fraction varies,depending on structure, between 0.27 and 0.97 amongthe nine structures. Here, a value of 0.27 (0.97) meansthat a sequence spends 27 (97) percent of the time inits MFE structure, and the remaining 73 (3) percent insome other structures with higher free energy. (TheMFE structure can be viewed as the structure in whicha sequence spends more time than in any otherstructure, even though it may not spend the majorityof its time in this structure.) Within the genotypenetwork of each structure, it varies between valuesranging from 0.27 to 0.96 for structure. ((.)). tovalues ranging from 0.51 to 0.71 for structure.((.)).How an evolving population explores a fitnesslandscape depends in part on the fraction of itssequences’ neighbors that are neutral. If a populationhas a larger neutral neighborhood, it may be able toaccess larger regions of the landscape throughnon-deleterious mutations, and may have a higherchance of finding beneficial mutations and newphenotypes. We computed the size of neutralneighborhoods, because it may be important for ourevolutionary analysis. This size is a function ofeffective population size Ne [45], which in our case isidentical to the census population size N, because thepopulations we simulate are unstructured, do notexperience migration, and do not fluctuate in size.Following standard population genetic theory [46, 47],we consider two neighboring sequences neutral iftheir fitness differs by less than 1/N. Figure S1ashows neutral neighborhood size as an average over1,000 randomly sampled RNA molecules of length 10that fold into one of the nine structures we consider(Table 1). Unsurprisingly, neutral neighborhood sizedecreases with increasing population size, whereneutral evolution and crossing of fitness valleysbecomes more difficult.To ensure that any observations we obtain fromour simulations are not artefacts of using very shortand non-biological sequences, we also simulated theevolution of four longer biological RNA molecules(30-43nts) that originate from different organisms,have different functions, and fold into differentpredicted secondary structures (Table 2). Specifically,these sequences include a ribozyme, a noncodingtranscript, a small non-messenger RNA (snmRNA),and a small nuclear RNA (snoRNA). (We note thateven though the secondary structures of thesesequences occur in nature, most of the sequences thatwe analyze and that fold into these structures may notoccur in nature.) While the large number of sequences

Int. J. Biol. Sci. 2017, Vol. 13folding into such longer structures [34] precludes anexhaustive analysis of their genotype networks, wefind that the neutral neighborhoods of these genotypenetworks also decrease in size with increasingpopulation size (Figure S1b).We quantified the ruggedness of the fitnesslandscapes of our RNA molecules in two ways. First,we counted the number of fitness peaks in eachlandscape of sequences of length 10, where we definea fitness peak as one or more sequences whoseneighbors all have lower fitness. With the exception ofstructure 2 (Str2) and structure 3 (Str3), which have 10and 23 peaks, respectively, all structures have fewerthan 10 peaks (Figure S2). This analysis was notpossible for the biological sequences, where too manysequences fold into any one structure. Second, weestimated the incidence of reciprocal sign epistasis,which causes fitness valleys to exist between asequence and its two-mutant neighbor. In epistasis,the fitness effect of an allele depends on other alleles.Sign epistasis occurs when the sign of the fitness effectof an allele changes (e.g. from beneficial todeleterious) due to epistatic interactions. When asequence and its two-mutant neighbor both showhigher fitness than the two single-mutants connectingthem in sequence space, one speaks of reciprocal signepistasis [48]. We find that fewer than 10 percent ofsuch sequence quadruplets show reciprocal signepistasis. This holds regardless of whether weconsider sequences of length 10 or longer sequences(Figure S3). Overall, these analyses show that thelandscapes we examine are not highly rugged.We simulated the adaptive evolution ofsequences forming each one of the nine seco

Research Paper Effect of Population Size and Mutation Rate . . and

Related Documents:

6th Grade Social Studies: World Geography and Global Issues SS60301 Unit 3: Population and Migration Lesson 1 Michigan Citizenship Collaborative Curriculum Page 1 of 11 Oakland Schools November 9, 2012 Graphic Organizer Population Patterns Population Growth Population Distribution Population Density . 6th Grade Social Studies: World Geography and Global Issues SS60301 Unit 3: Population and .

Every population of organisms is founded by some initial population. This portion of the module will examine how characteristics of the initial population can impact how quickly the population can grow. In sexually reproducing organisms, at least one male and one female must be present in the founding population for the population to ever grow.

Effective Population Size Census population size often inappropriate for population genetics calculations Breeding population size often smaller For genetic drift, historical events or nonrandom mating patterns might reduce EFFECTIVE size of the population Effective Population Size is an ideal population of size N in which all parents have an equal probability of being

CAPE Management of Business Specimen Papers: Unit 1 Paper 01 60 Unit 1 Paper 02 68 Unit 1 Paper 03/2 74 Unit 2 Paper 01 78 Unit 2 Paper 02 86 Unit 2 Paper 03/2 90 CAPE Management of Business Mark Schemes: Unit 1 Paper 01 93 Unit 1 Paper 02 95 Unit 1 Paper 03/2 110 Unit 2 Paper 01 117 Unit 2 Paper 02 119 Unit 2 Paper 03/2 134

The prison population stood at 78,180 on 31 December 2020. The sentenced prison population stood at 65,171 (83% of the prison population); the remand prison population stood at 12,066 (15%) and the non-criminal prison population stood at 943 (1%). Figure 1: Prison population, December 2000 to 2020 (Source: Table 1.1) Remand prison population

Population growth rate Data from Population Reference Bureau. 2009. 2009 World population data sheet. Human Population 231 Growth Rate Has Slowed For much of the twentieth century, the human population growth rate rose from year to year. Growth rate refers to how a population

A population pyramid displays a country's population in terms of age and shoe size. 2. By looking at the world's population pyramid, we can tell that the global population is 7.8 billion. 3. When a population pyramid's shape is closer to a rectangle than a pyramid, that country's population growth rate is slow. 4.

Population profile As at 2009, the official population size for Eastleigh was 315,496 (GoK, 2010). However, the African Population and Health Research Centre (2002) and Campbell (2005) indicated that the population ranged between 300,000- 500,000 (table 1). Table 1: Trends and Patterns of Population Growth Between1969 and 2009