Genetic Diversity And Population Structure Analysis To .

3y ago
31 Views
2 Downloads
2.11 MB
13 Pages
Last View : 19d ago
Last Download : 3m ago
Upload by : Maleah Dent
Transcription

Lee et al. BMC Genetics (2016) 17:142DOI 10.1186/s12863-016-0452-8RESEARCH ARTICLEOpen AccessGenetic diversity and population structureanalysis to construct a core collection froma large Capsicum germplasmHea-Young Lee1, Na-Young Ro2, Hee-Jin Jeong1, Jin-Kyung Kwon1, Jinkwan Jo1, Yeaseong Ha1, Ayoung Jung1,Ji-Woong Han1, Jelli Venkatesh1 and Byoung-Cheorl Kang1*AbstractBackground: Conservation of genetic diversity is an essential prerequisite for developing new cultivars withdesirable agronomic traits. Although a large number of germplasm collections have been established worldwide,many of them face major difficulties due to large size and a lack of adequate information about populationstructure and genetic diversity. Core collection with a minimum number of accessions and maximum geneticdiversity of pepper species and its wild relatives will facilitate easy access to genetic material as well as the use ofhidden genetic diversity in Capsicum.Results: To explore genetic diversity and population structure, we investigated patterns of molecular diversity usinga transcriptome-based 48 single nucleotide polymorphisms (SNPs) in a large germplasm collection comprising 3,821accessions. Among the 11 species examined, Capsicum annuum showed the highest genetic diversity (HE 0.44,I 0.69), whereas the wild species C. galapagoense showed the lowest genetic diversity (HE 0.06, I 0.07). TheCapsicum germplasm collection was divided into 10 clusters (cluster 1 to 10) based on population structure analysis,and five groups (group A to E) based on phylogenetic analysis. Capsicum accessions from the five distinct groups inan unrooted phylogenetic tree showed taxonomic distinctness and reflected their geographic origins. Most of theaccessions from European countries are distributed in the A and B groups, whereas the accessions from Asiancountries are mainly distributed in C and D groups. Five different sampling strategies with diverse geneticclustering methods were used to select the optimal method for constructing the core collection. Using a numberof allelic variations based on 48 SNP markers and 32 different phenotypic/morphological traits, a core collection‘CC240’ with a total of 240 accessions (5.2 %) was selected from within the entire Capsicum germplasm. Comparedto the other core collections, CC240 displayed higher genetic diversity (I 0.95) and genetic evenness (J’ 0.80),and represented a wider range of phenotypic variation (MD 9.45 %, CR 98.40 %).Conclusions: A total of 240 accessions were selected from 3,821 Capsicum accessions based on transcriptomebased 48 SNP markers with genome-wide distribution and 32 traits using a systematic approach. This corecollection will be a primary resource for pepper breeders and researchers for further genetic association andfunctional analyses.Keywords: Capsicum spp., Core collection, Genetic diversity, Germplasm, Population structure* Correspondence: bk54@snu.ac.kr1Department of Plant Science and Vegetable Breeding Research Center,Seoul National University, Seoul 151-921, KoreaFull list of author information is available at the end of the article The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication o/1.0/) applies to the data made available in this article, unless otherwise stated.

Lee et al. BMC Genetics (2016) 17:142BackgroundPepper (Capsicum spp.) is one of the major vegetableand spice crops grown worldwide, and is rich in bioactive compounds, such as capsaicinoids and carotenoids, which contribute to the improvement of humanhealth [1, 2]. Because of its economic and nutritional importance, breeders have improved agronomic traits ofpepper, such as pungency, fruit shape, abiotic stress tolerance, and disease resistance. Meanwhile, genetic diversity of breeding lines has become smaller and someuseful genes in the landraces are lost due to the breedingactivities [3, 4]. Therefore, conservation and sustainableutilization of genetic resources are keys to continuousimprovement of peppers [5].During the last several decades, there has been remarkable progress in germplasm collection and conservation of various plants. Although a large number ofgermplasms have been collected, their management hasbecome more and more complicated due to their hugesizes. Furthermore, little is known about the genetic diversity and structure of such collections at the interspecific and intraspecific levels [6]. To make efficient use oflarge germplasm collections, the concept of core collections has been proposed. A core collection is a subset ofa germplasm collection of a species that represents thegenetic diversity of the entire collection [7]. A good corecollection is one that has no redundant accessions, issmall enough to be easily managed, and represents thetotal genetic diversity [8].Various types of data including passport data, geographic origin [9, 10], agronomic traits [11–13], andmolecular markers [14] can be used for selecting acore set. Although the major reason for establishing acore set is to reduce the number of representative accessions up to 10 % while maintaining the diversity ofthe entire collection, there are a number of possiblemethods for selection of a core set depending on theresearch goals. In the early 2000s, most researchersperformed random sampling using various assignmentmethods [9, 11]. Later, the M (maximization) strategywas proposed as a more effective method to select acore set representing the maximum genetic diversitywithout redundancy [12, 15].Several research institutions have collected and conserved thousands of Capsicum accessions, ranging from1,000 in the Centre for Genetic Resources (CGN), theNetherlands [16] to almost 8,000 in the Asian VegetableResearch and Development Center (AVRDC), Taiwan[17]. Researchers and institutions have attempted toconstruct core collections of Capscicum spp. for variouspurposes. Fan et al. [13], Nicolai et al. [14], and Zewdieet al. [12] established core collections to reveal phenotypic and genetic variation. Thies and Fery [9], andQuenouille et al. [10] constructed a core collection forPage 2 of 13disease resistance against northern root-knot nematodeand Potato virus Y (PVY), respectively. Hanson et al.[11] developed a core collection to analyze antioxidantactivities. However, most studies involved a relativelysmall number of accessions, using fewer than 1,000 accessions with limited numbers of morphological traitsand molecular markers [11, 12, 14]. The limited numberof morphological traits and markers allow us to surveyonly a small portion of the genetic diversity of the entiregermplasm, and the resulting data cannot be used forgenome-wide variation studies.In this study, we performed population structure analysis in a large Capsicum germplasm collection consisting of 3,821 accessions by applying 48 genome-wideSNPs, and selected a core set using the SNP data together with data for 32 morphological traits. Thisallowed us to 1) examine the level of genetic diversityand the population structure within the worldwide Capsicum germplasm collection; 2) optimize selectionmethods by comparing different core sets, which wereselected using a stepwise selection strategy based onvarious combinations of data and clustering methods;and 3) ultimately construct a Capsicum core collectionthat represents the entire germplasm collection withoutredundancy. Finally, we validated the core collection byevaluating the diversity of a range of traits and genotyping additional molecular markers. This core collectionwill be a valuable data set for both pepper breeding andgenome-wide association studies.MethodsPlant materialsA total of 4,652 Capsicum accessions used in this studyoriginated from 97 countries and included 11 species: C.annuum, C. baccatum, C. cardenasii, C. chacoense, C.chinense, C. eximium, C. frutescens, C. galapagoense, C.praetermissum, C. pubescens, and C. tovarii. The geographic origin and passport data of the germplasm accessions were obtained from the Rural DevelopmentAdministration (RDA, Jeonju, Korea) and Seoul NationalUniversity (SNU, Seoul, Korea). Among the germplasmaccessions, 3,599 were obtained from the RDA, and1,053 were obtained from SNU. Most of the accessionswere C. annuum, accounting for 4,163 accessions. Fourother domesticated species, C. baccatum, C. chinense, C.frutescens, and C. pubescens accounted for 163, 122, 152,and 11 accessions, respectively. Among the wild Capsicum species, C. cardenasii, C. chacoense, C. eximium, C.galapagoense, C. praetermissum and C. tovarii accountedfor 1, 28, 4, 2, 5, and 1 accessions, respectively.DNA extraction and SNP genotypingTwo young leaves from each accession were used forDNA extraction. DNA was extracted using the cetyl

Lee et al. BMC Genetics (2016) 17:142trimethylammonium bromide (CTAB) method as described previously [18]. The concentration and purity ofDNA samples were determined with a NanoDrop 1000spectrophotometer (NanoDrop Technologies, Wilmington,DE, USA). DNA samples showing absorbance ratios above1.8 at 260/280 nm were used for marker analysis.A set of 48 SNP markers evenly distributed in 12pepper chromosomes were used in this study [19](Additional file 1: Table S1). In a preliminary study atotal of 282 accessions were randomly selected fromentire germplasm collection for genetic diversity studywith 412 SNP markers developed by Kang et al. [19].Based on this analysis, highly polymorphic SNPmarkers (PIC 0.45) were selected. Genotyping wasperformed using the BioMark HD system (Fluidigm,San Francisco, CA, USA), EP1 system (Fluidigm, SanFrancisco, CA, USA), and 48 48 Dynamic Array IFCs(Fluidigm, San Francisco, CA, USA) according to themanufacturer’s protocol [20]. Specific target amplification (STA) was performed prior to SNP genotypinganalysis. PCR was performed in a 5-μL reaction containing 60 ng of the DNA sample according to themanufacturer’s protocol. Thermal cycling conditionswere 15 min at 95 C, followed by 14 cycles of a 2-stepamplification profile of 15 s at 95 C and 2 min at 60 C.For genotyping, SNPtype assays were performed usingSTA products following manufacturer’s protocol. Thermalcycling was carried out at 95 C for 15 s, 64 C for 45 sand 72 C for 15 s with a touchdown of 1 C per cyclefrom 64 to 61 C, followed by 34 cycles of 95 C for 15 s,60 C for 45 s and 72 C for 15 s. For the species verification and/or identification of pepper accessions with missing species information, SNP markers C2 At5g04590,C2 At1g50020, and C2 At2g19560 were used based onhigh resolution melting (HRM) analysis [21]. Genotypinganalysis was performed using a Rotor Gene 6000 (Qiagen,Valencia, CA, USA).Population structure analysisTo analyze the population structure of the entire germplasm collection used in this study, we used a modelbased genetic clustering algorithm [22] as implementedin the STRUCTURE program ver. 2.3.4 [23]. The number of sub-populations (ΔK) was determined using thead-hoc statistical method, based on the rate change inthe log probability of data between successive K values[24]. Fifty independent runs for K values ranging from 1to 20 were performed with a burn-in length of 50,000followed by 1,000,000 iterations.Phylogenetic and principal coordinate analysesPhylogenetic trees were produced using genotyping datawith 48 SNP markers using both the unweightedneighbor-joining method and the hierarchical clusteringPage 3 of 13method based on the dissimilarity matrix calculated withManhattan index, as implemented in the DARwin software (version 6.0.9). Principal coordinate analyses werealso performed with DARwin 6.0.9 [25].Statistical analysis of genetic diversityDifferent indices were used for analysis and comparisonof diversity among the Capsicum collections. These include levels of observed heterozygosity (HO), expectedheterozygosity (HE), polymorphic information content(PIC), genetic differentiation (FST), Shannon’s information index of diversity (I), and genetic evenness (J’).Indices Ho, HE, PIC, and FST were calculated usingPower Marker 3.25 [26]. For analysis of genetic diversityof core collections, I and J’ were calculated followingHennink and Zeven [27] and Pielou [28], respectively.Analysis of molecular variance (AMOVA) was conducted to detect the genetic variance within and amongpopulation using GenAlEx ver 6.502 [29].Establishment of the core collectionTo establish a core collection, five different methodswere used. Specifically, core sets were selected based on1) genotype analysis of the entire collection, 2) genotypeanalysis of each cluster after grouping based on genotypedissimilarity, 3) phenotype analysis of the entire collection, 4) a combination of genotype and phenotype analysis of entire collection, and 5) a combination ofphenotype and genotype analysis of each cluster aftergrouping based on genotype dissimilarity.Representative accessions were selected based on theadvanced M strategy using a modified heuristic algorithm implemented in PowerCore software [30]. Categorical variables, such as genotype and qualitativephenotype were applied in several classes (3 to 12 classes) based on distinct characters. Continuous variables(quantitative phenotypes, 7 to 12 classes) were automatically classified into different categories in the softwarebased on Sturges’ rule [31]. Therefore, a total of 264phenotypic alleles were used to select the core entries(Additional file 1: Table S2).Evaluation of the core collectionsTo evaluate each core collection, diverse statistical indicators were calculated for two types of variables, continuous and categorical variables. For continuousvariables, the percentage of significant difference between core collections and the entire germplasm collection was calculated based on the mean difference (MD)percentage, the coincidence rate (CR) of range, the variance difference (VD) percentage, and variable rate (VR)of coefficient of variation. Among the candidate coresets selected from each different data set, a core set withMD less than 20 % and CR more than 80 % was

Lee et al. BMC Genetics (2016) 17:142considered as a representative collection. In addition, alower value in VD and higher value in VR was considered to indicate a more effective core collection [32]. Forcategorical variables, the I and J’ values were calculatedand compared between the five core collections and theentire germplasm collection. The maximum value of I (Imax) is calculated based on the log of the number ofclasses used in the entire collection; the value for a corecollection should be comparable to that of the entirecollection [8].Three additional markers having multiple alleles,COS643, COS111, and L4RP-3 F, which were selectedfrom the Sol Genomics Network [33] and Yang et al.[34], were used for validation of the core set. Meltingcurve patterns were identified by HRM analysis using aRotor Gene 6000 (Qiagen, Valencia, CA, USA). Thermalcycling conditions were 10 min at 95 C, 50 cycles of 3step amplification profile of 20 s at 94 C, 20 s at 55 C,and 40 s at 72 C, followed by final extension 60 s at 95 Cand 60 s at 40 C. HRM analysis was performed increasing0.1 C for every two seconds from 70 to 90 C.Finally, the core collection (CC240) with the highestgenetic diversity and evenness was planted in 2014 in aresearch farm (Suwon, Korea) to monitor the variationof the diverse traits. Morphological data were obtainedfor the same accessions that were genotyped. Thirty-twodifferent traits related to plant habit (9), leaf (4), flower(6), fruit (10), and seed (3) were analyzed. Phenotypedata were presented as the mean SE. The differencesbetween the mean values of individual clusters wereassessed using one-way ANOVA and Duncan’s multiplerange tests. P 0.05 was considered to indicate a statistically significant difference. The IBM SPSS Statistics v23software (IBM Corp., Armonk, NY, USA) was used foranalysis.ResultsGenetic diversity of the Capsicum germplasmIn our preliminary studies, a total of 4,652 non-redundantaccessions from 11 species were screened using SNPmarkers to reveal the genetic diversity (Additional file 1:Table S3). Based on the HO values, 673 accessions mostlyfrom C. annuum with Ho value more than 0.3 were considered as F1 hybrids (Additional file 2) and excludedfrom analysis. In addition, 158 accessions with more thanseven missing genotype data points were also excluded.Ultimately, a total of 3,821 accessions were used for further experiments (Table 1).Using the SNP genotyping results, the HE, HO, and Iwere calculated for 3,821 pepper accessions (Table 1).The HE values ranged from 0.10 to 0.44, and I valuesranged from 0.07 to a maximum of 0.69. The highest diversity values in C. annuum accessions (HE 0.44, I 0.69) suggests that there is extensive genetic variationPage 4 of 13Table 1 Genetic diversity analysis of the 3,821 pepper accessionsSpeciesNumberHOHEIC. annuum3,3830.120.440.69C. baccatum1500.120.260.51C. cardenasii10.210.10.14C. chacoense240.170.280.54C. chinense1050.110.380.56C. eximium30.140.230.45C. frutescens1370.090.370.55C. galapagoense10.130.060.07C. praetermissum50.210.180.31C. pubescens110.160.120.29C. tovarii10.150.070.12Total3,8210.150.230.38Ho observed heterozygosity, HE expected heterozygosity, I Shannon’sinformation index of diversitywithin this species. With the exceptions of C. baccatumand C. pubescens, the other domesticated speciesshowed relatively high HE values, above 0.37. The HOvalue of C. annuum was 0.12, whereas those of the otherspecies varied from 0.09 to 0.21. Four domesticated species C. annuum, C. baccatum, C. chinense, and C. frutescens and two wild species C. chacoense, and C. eximiumhad lower values for HO compared to HE, (Table 1)whereas C. cardenasii, C. galapagoense, C. pratermissum,C. pubescens, and C. tovarii had relatively higher valuesof HO compared to HE. This pattern suggests that thefirst six species have experienced inbreeding for a longtime which could be attributed to the interplay of manyfactors such as artificial selection, non-random matingbetween individuals, population structure and size, andWahlund effect (mixing of individuals from differentgenetic sources) [35, 36]. By contrast, accessions of thelatter five species were collected in different isolated locations where each accession had evolved independently.Population structure of the germplasm collectionThe SNP genotyping results were used to perform population structure analysis for the 3,821 accessions underan admixed model using the STRUCTURE program[23]. Estimated likelihood (LnP (D)) was found to begreatest when K 10, suggesting that the populationused in this study can be divided into ten clusters (Fig. 1).The clusters 3, 8, 9, and 10 were rather well separatedfrom others whereas the cluster 1, 2, 4, 5, 6, and 7 wereadmixtures. Each of the 10 clusters included differentnumbers of accessions, ranging from 85 to 806 (Table 2).The average distance (HE) between individuals in eachcluster was 0.32. The highest HE value of 0.43 was observed in cluster 5, indicating greater genetic diversitywithin this cluster, whereas cluster 9 showed the lowest

Lee et al. BMC Genetics (2016) 17:142Page 5 of 13Fig. 1 Population structure of the Capsicum germplasm collection. a ΔK reached its maximum value when K 10 following the ad-hoc method.b Ten subpopulation clusters inferred by STRUCTURE are represented by different colorsHE value of 0.11. Genetic differentiation (FST) values varied from 0.08 to 0.78 with an average of 0.33. The smallest FST value (0.08) wa

Results: To explore genetic diversity and population structure, we investigated patterns of molecular diversity using a transcriptome-based 48 single nucleotide polymorphisms (SNPs) in a large germplasm collection comprising 3,821 accessions. Among the 11 species examined, Capsicum annuum showed the highest genetic diversity (H E 0.44,

Related Documents:

characterize genetic diversity, population structure, and effective population size in Dasypterus ega and D. intermedius, two tree-roosting yellow bats native to this region and for which little is known about their population biology and seasonal movements. There was no evidence of population substructure in either species. Genetic diversity

Understanding genetic diversity, population structure, and linkage disequilibrium is a prerequisite for the association mapping of complex traits in a target population. In this study, the genetic diversity and population structure of 40 waxy and 40 normal inbred maize lines were investigated using 10 morphological traits and 200

tion diversity. Alpha diversity Dα measures the average per-particle diversity in the population, beta diversity Dβ mea-sures the inter-particle diversity, and gamma diversity Dγ measures the bulk population diversity. The bulk population diversity (Dγ) is the product of diversity on the per-particle

NETWORK. Genetic diversity, population differentiation, and analysis of molecular variance (AMOVA) were used to determine genetic structure. MEGA was used to construct phylogenetic trees. Genetic diversity of J. hopeiensis was moderate based on nuclear DNA, but low based on unipa-rentally inherited mitochondrial DNA and chloroplast DNA.

utilization and conservation of cattle breeds. This study investigated genetic diversity and the population structure among six cattle breeds in South African (SA) including Afrikaner (n 44), Nguni (n 54), Drakensberger (n 47), Bonsmara (n 44), Angus (n 31), and Holstein (n 29). Genetic diversity within cattle breeds was analyzed .

genetic diversity and the authenticity of the Sapsaree breed. Keywords: Sapsaree, Genetic diversity, Population structure Background The domestic dog (Canis familiaris) is the most pheno-typically diverse mammalian species, and one of the first animals to be domesticated by humans [1–3]. While dogs are the closest animal companion of humans, they

this study were to 1) characterize genetic structure of ancestry population, 2) analyze geographic distribution of each population in rice growing areas of the world, and 3) describe genetic diversity and specialty in each of the populations, including average alleles distinct and private to a population in the USDA rice world collec-tion.

The American Petroleum Institute Manual of Petroleum Measurement Standards (API MPMS) Chapter 19 details equations for estimating the average annual evaporation loss from storage tanks. These equations are based on test tank and field tank data and have been revised since initial publication for more accurate estimations. WHAT IS EVAPORATION? Evaporation is when a substance changes from the .