Repetitive Sequences In Complex Genomes: Structure And .

2y ago
9 Views
2 Downloads
271.90 KB
21 Pages
Last View : 10d ago
Last Download : 3m ago
Upload by : Adele Mcdaniel
Transcription

Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.ANRV321-GG08-11ARI25 July 200718:8Repetitive Sequencesin Complex Genomes:Structure and EvolutionJerzy Jurka, Vladimir V. Kapitonov,Oleksiy Kohany, and Michael V. JurkaGenetic Information Research Institute, Mountain View, California 94043;email: jurka@girinst.org, vladimir@girinst.org, kohany@girinst.org,michael@girinst.orgAnnu. Rev. Genomics Hum. Genet. 2007. 8:241–59Key WordsFirst published online as a Review in Advance onMay 21, 2007.transposable elements, repetitive DNA, regulation, speciationThe Annual Review of Genomics and Human Geneticsis online at genom.annualreviews.orgThis article’s doi:10.1146/annurev.genom.8.080706.092416c 2007 by Annual Reviews.Copyright All rights reserved1527-8204/07/0922-0241 20.00AbstractEukaryotic genomes contain vast amounts of repetitive DNA derived from transposable elements (TEs). Large-scale sequencing ofthese genomes has produced an unprecedented wealth of information about the origin, diversity, and genomic impact of what wasonce thought to be “junk DNA.” This has also led to the identification of two new classes of DNA transposons, Helitrons and Polintons,as well as several new superfamilies and thousands of new families.TEs are evolutionary precursors of many genes, including RAG1,which plays a role in the vertebrate immune system. They are alsothe driving force in the evolution of epigenetic regulation and have along-term impact on genomic stability and evolution. Remnants ofTEs appear to be overrepresented in transcription regulatory modules and other regions conserved among distantly related species,which may have implications for our understanding of their impacton speciation.241

ANRV321-GG08-11ARI25 July 200718:8INTRODUCTIONAnnu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.The term “repetitive sequences” (repeats,DNA repeats, repetitive DNA) refers to homologous DNA fragments that are presentin multiple copies in the genome. Repetitive DNA was originally discovered basedon reassociation kinetics and classified into“highly” and “middle” repetitive sequences(14), roughly corresponding to tandem andinterspersed repeats discussed below. This review is centered primarily on repeat researchbased on DNA sequence analysis and does notcover the so-called low copy repeats (LCRs),also known as segmental duplications, whichrepresent a separate category of duplicated diverse chromosomal segments (105).Repeats can be clustered into distinctfamilies each traceable to a single ancestralsequence or a closely related group of ancestral sequences. In contrast to multigene families, which are defined based on their biological role, repetitive families are usually definedbased on their active ancestors, called master or source genes, and on their generationmechanisms. Over time, individual elementsfrom repetitive families may acquire diversebiological roles.There are two basic types of repetitivesequences: interspersed repeats and tandemrepeats. Interspersed repeats are DNA fragments with an upper size limit of 20–30 kb,inserted more or less at random into hostDNA. In contrast, tandem repeats representarrays of DNA fragments immediately adjacent to each other in head-to-tail orientation.This review focuses on interspersed repetitiveDNA from eukaryotic genomes. Interspersedrepeats are mostly inactive and often incomplete copies of transposable elements (TEs)inserted into genomic DNA. TEs are segments of DNA or RNA capable of being reproduced and inserted in the host genome. Atthe same time, genomes are essentially conservative structures that have evolved mechanisms to counteract such insertions. Therefore, TEs and host genomes are locked in apermanent antagonistic relationship resem-242Jurka et al.bling an “arms race.” Eukaryotic hosts continuously suppress activities of TEs, but TEproliferation persists in virtually all known eukaryotic species. Of all eukaryotic genomessequenced to date, only the genome Plasmodium falciparum appears not to host any activeTEs (35).Why do complex, conservative genomestolerate the activities of inherently antagonistic elements? TEs cannot be easily eliminated and their endurance in the host can becompared to that of parasites. Furthermore,if TEs can provide evolutionary advantagesto the host, their chances of survival increase.The view that TEs are beneficial to the hostis not new (16, 44, 68, 87) but recent progressin the field puts it squarely at the center of theongoing debate on eukaryotic evolution.STRUCTURE ANDSYSTEMATICS OFTRANSPOSABLE ELEMENTSGeneral CharacteristicsFigure 1 presents a schematic structure ofTEs. All types of TEs are represented byautonomous and nonautonomous variants.Whereas an autonomous element encodes acomplete set of enzymes characteristic of itsfamily and is self-sufficient in terms of transposition, a nonautonomous element transposes by borrowing the protein machineryencoded by its autonomous relatives. Despite their dazzling diversity, all eukaryoticTEs fall into two basic types: retrotransposons and DNA transposons. Retrotransposons are transposed through an RNA intermediate. Their messenger RNA (mRNA)is expressed in the host cell, reverse transcribed, and the resulting complementaryDNA (cDNA) copy is integrated back into thehost genome. Reverse transcription and integration are catalyzed by reverse transcriptase(RT) and endonuclease/integrase (EN/INT),which are encoded by autonomous elements.Unlike retrotransposons, DNA transposons

ANRV321-GG08-11ARI25 July 200718:8a Non-LTR retro(trans)posons - LINEs and SINEs(Tail)[TT] [ ]ORF1, ORF2 (EN, RT)Int. Pol II promoter[TT] [ ][TT] [ ][ ]Autonomous(Tail)[ ]Non-autonomous(Tail)[ ]Non-autonomousAnnu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.Pol III promoterb LTR retrotransposons and retrovirus-like elementsLTRgag, pol, envLTR (A)n RNA[ ]LTRgag, pol, envLTR[ ]LTRLTRTGAATAAAPolA signalPol II promoter(TATA box)[ ][ ]AutonomousNon-autonomousCALTRc Cut-and-paste transposons (DNA transposons)[ ][ ]TIRTIRTransposaseTIRTIR[ ][ ]AutonomousNon-autonomousd Rolling-circle transposons (Helitrons)[A] TCCTTR [T]REP, helicase[A] TCCTTR [T]AutonomousNon-autonomouse Self-synthesizing transposons (Polintons)[ ][ ]CTAGINT, PolB, CysP, ATPase 5 ORFsAGCT[ ][ ]AutonomousNon-autonomousFigure 1A schematic representation of major classes of transposable elements, including nonautonomouselements.www.annualreviews.org Repeats in Complex Genomes243

ARI25 July 200718:8are transposed by moving their genomic DNAcopies from one chromosomal location to another without any RNA intermediate. Mostretrotransposons and DNA transposons areflanked by target site duplications (TSDs) resulting from fill-in repair of staggered nicksgenerated at the DNA target site upon insertion of TEs (42).All currently known eukaryotic retrotransposons can be divided into four classes:non-long terminal repeat (LTR) retrotransposons, LTR retrotransposons, Penelope, andDIRS retrotransposons. Although the first twoclasses (Figure 1a,b) are relatively well established and studied (29), the Penelope and DIRSclasses were only recently introduced (2, 30,81, 98). Members of all four classes of retrotransposons are present in the genomes of virtually all eukaryotic kingdoms: Protista, Plantae, Fungi, and Animalia. The only exceptionis Penelope, which, so far, has not been identified in plants.Eukaryotic DNA transposons belong tothree classes: “cut-and-paste” transposons,Helitrons, and Polintons (Figure 1c,d,e). Thecorresponding mechanisms of transpositionare cut-and-paste (23), rolling-circle replicative (60), and self-synthesizing (65), respectively. The cut-and-paste transposons andHelitrons cannot synthesize their own DNA;instead, they multiply using host replicationmachinery.Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.ANRV321-GG08-11Non-Long Terminal Repeat andLong Terminal RepeatRetrotransposonsA typical autonomous non-LTR retrotransposon, commonly referred to as a long interspersed element (LINE), contains one or twoopen reading frames (ORFs). It includes an internal promoter in the 5 terminal region thatgoverns transcription of the retrotransposoninserted in the host genome. The mechanismof LINE retrotransposition and integrationinto the genome is well studied and is viewedas a coupled process called target-primed reverse transcription (TPRT). According to the244Jurka et al.TPRT model, reverse transcription is primedby the free 3 hydroxyl group at the targetDNA nick introduced by EN (29). The modelwas recently enhanced by the finding that initiation of the L1 reverse transcription does notrequire base pairing between the primer andtemplate (72). Moreover, as expected from themodel, EN is not necessary for L1 retrotransposition when free 3 -hydroxyl groups become available in disfunctional telomeres (91).Both RT and EN domains in L1 are encodedby the same ORF. An mRNA expressed during transcription of a genomic copy of LINEretrotransposon serves as a template for reverse transcription, and the resulting cDNAis inserted in the genome.Based on structural features of nonLTR retrotransposons and phylogeny of RTs,LINEs can be assigned to five groups, calledR2, L1, RTE, I, and Jockey, which can be subdivided into 15 clades (29, 70). It is believedthat the R2 group is composed of the mostancient non-LTR retrotransposons, the CRE,NeSL, R2, and R4 clades, which are characterized by a single ORF coding for RTand an EN C terminal to the RT domain.The R2 EN is similar to different restriction enzymes, and all TEs from the R2 groupretrotranspose into highly specific target sites.Members of the remaining four groups encode the apurinic-apyrimidinic endonuclease(APE), which is always N terminal to the RTdomain. In addition to RT and EN, membersof the first group code for RNase H (29), including the Ingi, I, LOA, R1, and Tad1 clades.Nonautonomous non-LTR retrotransposonsare usually referred to as short interspersedelements (SINE) retrotransposons. Typically,they are mosaic structures derived from transfer RNA (tRNA) or 7SL or 5S ribosomalRNA, and contain 5 internal pol III promoters involved in transcription. The 3 endsof SINEs are either derived from LINE elements or contain poly(A) tails recognizableby L1 elements. They may share commonstructural constraints (111). Their retrotransposition is catalyzed by RT/EN encoded bythe autonomous non-LTR retrotransposons.

Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.ANRV321-GG08-11ARI25 July 200718:8Non-LTR retrotransposons are transmittedvertically (i.e., from parents to offspring), withsome notable exceptions (71).An LTR retrotransposon (Figure 1b) maycarry three ORFs coding for the gag, env, andpol proteins. The pol protein is composed ofthe RT, EN, and aspartyl protease domains.The EN domain in LTR retrotransposons isusually called INT and is distantly related tothe DDE transposase (named after two aspartate and one glutamate residues forming a catalytic triad), encoded by Mariner DNA transposons (20, 29). LTR retrotransposons can betransferred horizontally (49), although the extent of the process is not clear.Penelope. The Penelope retrotransposonsencode a single ORF composed of the RTand EN domains. The latter is similar to GIYYIG intron-encoded ENs (2, 30, 84, 121),named after the conserved amino acid motifGly-Ile-Tyr-Xn-Tyr-Ile-Gly. It appears thatthe Penelope RT is closer to telomerases andbacterial RTs than RTs encoded by non-LTRretrotransposons (2). Like many families ofnon-LTR retrotransposons, Penelope elementsgenerate 10–15 base pair (bp) TSDs and probably follow the TPRT model of retrotransposition (29, 30). However, Penelope elements arecharacterized by unusual LTR-like or invertedterminal repeats not typical for standard nonLTR retrotransposons. Also, some Penelopeelements from different species retain introns after their retrotransposition (2). Basedon their structural and phylogenetic features,Penelopes are viewed as a separate class of retrotransposons. However, given the low resolution of phylogenetic trees built for extremelydivergent and ancient RTs, an alternative viewof Penelopes as the most ancient/basal groupof non-LTR retrotransposons cannot yet beruled out.DIRS. The Dictyostelium intermediate repeatsequence (DIRS) retrotransposons encode aRT that is phylogenetically closer to that encoded by LTR retrotransposons than to theRT in non-LTR retrotransposons (29). Un-til recently, DIRS elements were viewed as anenigmatic class of INT-free retrotransposonscharacterized by an unusual structure of terminal repeats (19, 29). However, it turns outthat DIRS elements encode a protein belonging to the INT family of tyrosine recombinases (tyrosine INT) (40). This observationand the unusual structure of termini led tothe classification of DIRS elements as a separate class of LTR retrotransposons (28, 41).Given a wide distribution of highly diverseDIRS retrotransposons in different eukaryotickingdoms, it appears that they are as ancientas LTR retrotransposons. It also appears thatthe DIRS RT is grouped phylogenetically withthe Gypsy RT (29, 98) and separately from theBEL and Copia LTR retrotransposons, whichare viewed as the most ancient LTR retrotransposons (29). Therefore, the most parsimonious scenario of DIRS origin is thatthey evolved from a Gypsy-like ancestral LTRretrotransposon after recruiting the tyrosineINT, which replaced the standard DDE INT.This scenario is consistent with the observation of tyrosine recombinase/INT-encodingDNA transposon-like elements in some fungi(Crypton transposons) (39) and in ciliates (Tectransposons) (27, 48). The suggested recruitment might have occurred following insertionof an ancient tyrosine recombinase-encodingDNA transposon into the Gypsy-like predecessor of DIRS elements. Analogously to Penelope retrotransposons, some DIRS elementsretain introns in ORFs (41). Such intron retention could be important for the retrotranspositon of Penelope and DIRS. For instance,non-spliced DIRS/Penelope mRNA retained inthe nucleus can be a better substrate for retrotransposition than the spliced one (30, 41).Cut-and-Paste DNA TransposonsDuring its transposition, a cut-and-pasteDNA transposon is excised (cut) from its original genomic location and inserted (pasted)into a new site (23). Both reactions are catalyzed by a transposase that binds the termini of a transposon and its target site andwww.annualreviews.org Repeats in Complex Genomes245

ANRV321-GG08-11ARITable 125 July 200718:8Superfamilies of “cut-and-paste” DNA transposonsSuperfamily nameRelated IS42MarinerIS6302MerlinIS10168–92MirageAnnu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.Size of target Bac4Rehavkus9Transib5introduces DNA nicks. Most DNA transposons contain 10–400 bp long terminal inverted repeats (TIRs) at both ends. However, in some active transposons, TIRs areimperfect or absent (e.g., some MuDR transposons in Arabidopsis thaliana) (59). Eukaryotic “cut-and-paste” DNA transposons can beassigned to 13 superfamilies (Table 1). Eachsuperfamily includes diverse families composed of autonomous and nonautonomouselements, whose transposition is catalyzedby superfamily-specific transposases. Transposases from different superfamilies are notsimilar to each other [i.e., position specific iterative basic local alignment andsearch tool (PSI-BLAST) expected values arehigher than 0.05] (1). In addition to thesuperfamily-specific transposases, each superfamily is characterized by a specific lengthof TSDs (Table 1). However, some superfamilies, such as En/Spm and Harbinger, havethe same length of TSDs. Autonomous DNAtransposons from most superfamilies encodeonly one protein (transposase). They includeMariner (97), hAT (73), P (4), piggyBac (85),Transib (61, 64), Merlin (31), Mirage, IS4EU,Novosib and Rehavkus (36). Transposons fromthe En/Spm (73), Harbinger (59), and MuDR(122) superfamilies code for DNA-bindingproteins in addition to transposases.246Jurka et al.Harbinger was the first superfamily ofDNA transposons discovered based on computational studies (59). The autonomousHarbingers encode two proteins: a 400amino acid (aa) Harbinger transposase and a 200-aa DNA-binding protein that includesthe conserved SANT/myb/trihelix motif. TheHarbinger transposase is distantly related totransposases encoded by the IS5 group ofbacterial transposons, including IS5, IS112,and ISL2. Harbingers are typically flanked by3-bp TSDs, frequently TAA or TTA trinucleotides, but some Harbingers from the zebrafish genome show a striking preferencefor a 17-bp target site (AAAACACCWGGTCTTTT), longer than the target for anyother DNA transposon family (62).Helitrons. Helitron DNA transposons transpose via replicative rolling-circle transposition (60). Helitrons are present in the genomesof plants, fungi, insects, nematodes, and vertebrates. In some species, including A. thalianaand Caenorhabditis elegans, they constitute 2% of the genome. Autonomous Helitronsencode the 1500-aa, so-called Rep/Hel protein, composed of the replication initiator(Rep), and helicase (Hel) conserved domains.The Rep domain spans a 160-aa regioncomposed of the “two-His” (E-FYW-QK-R-G-LAV-PVH-X-H) and “KYK” (YgLVW-FAT-Kq-Y-X-X-K) motifs separated by 130 aa. These motifs are conserved in Reps,which are encoded by plasmids and singlestranded DNA viruses replicating by rollingcircle mechanism. The Rep proteins performboth cleavage and ligation of DNA duringrolling-circle replication, the same as transposases. The 500-aa Hel domain is a helicasethat belongs to the SF1 superfamily of DNAhelicases. Helitron is the only known class oftransposons in eukaryotes that integrates intothe genome without introducing TSDs. Usually, the Helitron integration occurs preciselybetween A and T nucleotides in the host. Helitrons do not have TIRs, which are typicallypresent in other DNA transposons. Instead,Helitrons have conserved 5 -TC and CTRR-3

Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.ANRV321-GG08-11ARI25 July 200718:8termini. They also contain a 18-bp hairpinseparated by 10–12 nucleotides from the 3 end. Presumably, the hairpin serves as the terminator of rolling-circle replication, which isbelieved to be the mechanism for Helitron’stransposition. So far, only Helitrons found inthe Aspergillus nidulans genome do not containthe 3 hairpin (33).Although no active Helitrons have been isolated and studied experimentally so far, themain features of Helitron transposition canbe predicted a priori based on the structural invariants detected in different Helitronsand known properties of bacterial rollingcircle replicons (60). Helitron transpositionstarts from a site-specific Rep-encoded nicking of the transposon-plus strand. Next, thefree 3 -OH end of the nicked-plus strandserves as a primer for leading-strand DNAsynthesis facilitated by the Helitron helicaseand some host replication proteins, includingDNA polymerase and replication protein. ARPA-like single-stranded DNA-binding proteins. The newly synthesized leading-plusstrand remains covalently linked to the 3 -OHend of the parent-plus strand during the continuous displacement of its 5 -OH end. Whenthe leading strand makes a complete turn, Repcatalyzes a strand-transfer reaction followedby the release of a single-stranded DNA intermediate, the parent-minus strand, and adouble-stranded DNA Helitron composed ofboth the parental-plus and a newly synthesized strand (60).Another interesting feature of Helitron isits ability to intercept host genes. For example, plant Helitrons encode RPA-like proteins, clearly derived from RPA encoded originally by the host genome (60). Given theconservation of RPA in Helitrons, this protein is almost certainly involved in Helitrontransposition, presumably as a single-strandedDNA-binding protein. Helitrons present insea anemone, sea urchin, fish, and frog carryEN derived from CR1-like non-LTR retrotransposons (63, 99). Again, the conservationof the EN in different Helitrons from different species shows that it must be neces-sary for the life cycle of Helitrons. Finally, numerous nonautonomous Helitrons in the corngenome harbor exon-/intron-coding portionsfrom many different host genes (75, 90).Therefore, Helitrons may function as a powerful tool of evolution, by mediating duplication, shuffling, and recruitment of hostgenes.Polintons. Like Helitrons, the third class ofDNA transposons, Polintons, was discoveredand characterized based on computationalstudies (65). Polintons are 15–20 kb long, with6-bp TSDs and 100–1000 bp TIRs at bothends. They are the most complex eukaryotictransposons known to date. Polintons code forup to 10 proteins, including a family B DNApolymerase (POLB), a retroviral-like INT, anA transposase, and an adenoviral-like cysteineprotease. The first three are universal for allautonomous Polintons identified in protists,fungi, and animals (65).Polinton POLB belongs to a group ofprotein-primed DNA polymerases encodedby genomes of bacteriophages, adenoviruses,and linear plasmids from fungi and plants.POLB and its functional motifs are well defined (10, 24, 115), and their conservationin all extremely diverged Polinton POLBsindicates that the DNA-DNA polymeraseand proofreading activities are necessary forPolinton transposition. The termini of Polintons are composed of short 1–3-bp tandemrepeats, which are necessary for the slideback mechanism in protein-primed DNA synthesis studied in bacteriophages (88). Basedon these observations, it was proposed thatPolintons propagate through protein-primedself-synthesis by POLB (65). First, duringhost genome replication, the INT-catalyzedexcision of Polinton from the host DNAleads to an extrachromosomal single-strandedPolinton that forms a racket-like structure.Second, the Polinton POLB replicates the extrachromosomal Polinton. Finally, after thedouble-stranded Polinton is synthesized, theINT molecules bind to its termini and catalyze its integration into the host genome.www.annualreviews.org Repeats in Complex Genomes247

ANRV321-GG08-11ARI25 July 200718:8FROM TRANSPOSABLEELEMENTS TO GENESAnnu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.The first clear example of a functionalprotein-coding gene that evolved from a former TE was centromere protein B (CENP-B)(110, 116). CENP-B is conserved in mammalsand binds a specific 17-bp site in the human αsatellite, but its exact function is still not clear.Based on gene knockout studies, it appearsthat CENP-B is involved in reproductionrather than in centromere-related activities,as was originally predicted (32, 112). Threeyeast CENP-B homologs were reported (47),but given their low ( 30%) identity to mammalian CENP-B and a similar range of identities between different transposases in mammals, plants, and fungi, they probably evolvedin the yeast genome from the Mariner/Pogotransposase independently of the mammalianCENP-B.Approximately 50–100 protein-codinggenes in the mammalian genome evolvedfrom coding sequences of DNA transposonsand retrotransposons (12, 18, 62, 64, 66, 76,120; V.V. Kapitonov & J. Jurka, unpublishedwork). Most of these genes ascended fromtransposases (Mariner/Pogo, hAT, piggyBac,P, Harbinger, and Transib). The RAG1protein, which is a key player in V(D)Jrecombination (103, 113), is probably themost ancient known host protein derivedfrom a transposable element (62, 64). RAG1evolved some 500 million years ago (mya)in a common ancestor of jawed vertebratesfrom a Transib DNA transposase (64). It isalso the only transposase-derived host genewith demonstrated nuclease-/transposaselike activities. Biological properties of theremaining transposase-derived genes areeither not known or linked to DNA/RNAbinding (52, 78, 114). The RAG1-basedimmune system is also the only example ofa complex host machinery that evolved fromtransposase and TIRs from the same family oftransposons (64). There are other genes thatmay be involved in DNA rearrangements,as they encode transposase-derived proteins248Jurka et al.sharing conserved catalytic amino acids withcorresponding transposases. An example ofsuch a conserved gene is HARBI1, whichevolved from the Harbinger transposase in acommon ancestor of fish, birds, frogs, andmammals (62).Other potential sources of novel proteincoding genes are LTR retrotransposons. Forinstance, 50 protein-encoding genes syntenic between the human and mouse genomesevolved from the gag protein encoded byGypsy LTR retrotransposons, which were active in ancestral genomes (12, 18, 66, 79, 94,119). One of the Gypsy-derived genes, calledPEG10 or KIAA1051, includes Gypsy gag andprotease domains, which are fused togetherthrough the –1 ribosomal frame-shift mechanism typical for Gypsy elements (82, 94, 119).Although its exact function is still unknown,PEG10 is important for mouse parthenogenetic development based on observed embryonic lethality due to placental defects inPEG10 knockout mice (95). Although thereare 30 examples of host genes evolved fromDNA transposases, there is only one example of a recruited RT: the mammalian Rtl1 orPEG11 gene, which evolved from the Gypsygag and RT (104). Interestingly, both PEG10and PEG11 are paternally expressed genes,and more than 50% of all gag-derived genesreside on the X chromosome.Finally, many microRNA genes appear tohave evolved from TEs, and their involvementin gene regulation appears to be an outcomeof the antagonistic relationship between TEsand the host genome. Expression of TEs andgeneration of repetitive DNA, including tandem repeats, are countered by RNA degradation and DNA methylation (17, 22, 100,118) mediated by small RNAs (sRNAs) ( 20–26 bp) generated from the targeted repetitiveDNA. Analogous processes are involved inmodulating chromatin structure and regulating gene expression (7, 8, 15, 22, 101). Manyof such processes are mediated by sRNAs derived from evolutionarily conserved precursors (21).

Annu. Rev. Genom. Human Genet. 2007.8:241-259. Downloaded from arjournals.annualreviews.orgby Stanford University Robert Crown Law Lib. on 11/17/07. For personal use only.ANRV321-GG08-11ARI25 July 200718:8Most of the epigenetic regulation ofendogenous genes in A. thaliana appears tohave evolved from mechanisms to silenceTEs (128). Furthermore, some mammalianprecursors of microRNAs (miRNAs) appearto be derived from ancient MIR (SINE)and L2 (LINE) elements (108), or evenyounger Alu (SINE) elements and processedpseudogenes (25, 109). Recent evidence that5 Alus can function as RNA polymerasepromoters for miRNAs (11) further supportsthe contributions of TEs to the origin and expression of miRNAs involved in mammaliangene regulation.OTHER HIGHLY CONSERVEDTRANSPOSABLE ELEMENTSRecent systematic comparisons of complete genomic sequences revealed the existence of noncoding sequences that arehighly conserved across multiple species (6).They include LF-SINE (5), MER121 (58),AmnSINE1, and SINE3-1 (92, 127), whichare SINE, or SINE-like, elements preservedin highly diverse vertebrates from Latimeria and reptiles to mammals. An additional83 families of low and moderately repeatedTable 2elements were reported recently and deposited in Repbase (38, 53). The list includes20 Eulor families, 15 newly analyzed MERfamilies, 31 UCON families, 14 LINE-likefamilies (X LINEs), and 3 MARE families.Eulor families are relatively small, with selfcomplementary regions suggesting that theymight have been derived from DNA transposons. Likewise, many MER elements alsoresemble nonautonomous DNA transposons.Furthermore, mammalian-specific MARE3 isa tRNA-derived SINE (38). X LINEs, wherethe asterisk stands for specification of one ofthe 14 families, were directly or indirectly derived from autonomous non-LTR retrotransposons, a fact supported by significant similarities between their translatable regions todiverse LINE elements (38).Table 2 shows densities of the abovedescribed families of repeats, including a moderately repetitive L4 family (36), in five vertebrate genomes. Columns 2 and 3 showdensities of the same families in humanconserved sequences (106) and cis-regulatorymodules (CRMs) (9). For some families,the densities of TEs in CRMs can be asmuch as a factor of magnitude higher thanthe average human genomic density. SimilarDensities of selected repetitive families per 1 Mb of DNA sequenceH.s.Cons.CRMsE.t.M.d.G.g.X.t.AmnSINE1 GG0.280.810.940.170.150.640.24AmnSINE1 HS0.171.721.650.130.271.100.02Eulor 2-1361.228.699.520.631.531.850.07SINE3-1 .510.652.550.15X LINE0.632.983.180.281.191.060.57DNA origin: human (H.s., Homo sapiens), tenrec (E.t., Echinops telfairi ), Brazilian gray short-tailed opposum (M.d.,Monodelphis domestica), chicken (G.g., Gallus gallus), and the pipid frog (X.t., Xenopus tropicalis). Colu

Eukaryotic genomes contain vast amounts of repetitive DNA de-rived from transposable elements (TEs). Large-scale sequencing of these genomes has produced an unprecedented wealth of informa-tion about the origin, diversity, and genomic impact of what was once thought to be “junk DNA.” This has also led to the identifica-

Related Documents:

Chapter 21: Genomes & Their Evolution 1. Sequencing & Analyzing Genomes 2. How Genomes Evolve. 1. Sequencing & Analyzing Genomes Chapter Reading – pp. 437-447. Whole Genome Shotgun Sequencing Cut the DNA into overlapping frag-ments short enough for sequencing. 1 Clone the fragments in plasmid or phage vectors. 2 Sequence each

Complex sequences and series An infinite sequence of complex numbers, denoted by {zn}, can be considered as a function defined on a set of positive integers into the unextended complex plane. For example, we take zn n 1 2n so that the complex sequence is {zn} ˆ1 i 2, 2 i 22, 3 i 23,··· . Convergence of complex sequences

C. sakazakii genomes was 4393kb, with an average of 4055 protein coding genes, and an average genome G C content of 56.9%. The genomes contained genes related to carbohydrate transport and metabolism, amino acid transport and metabolism, and cell wall/membrane biogenesis. In addition, we identified genes encoding proteins

So, what is functional genomics? Where sequence-based genomics looks at the structure and components of genomes, and analyses the similarities and differences between genomes Functional genomics looks at how genomes result in cellular phenotypes , and analyses di

CHAPTER 21 GENOMES AND THEIR EVOLUTION Comparisons of genomes provide Tree of Life information about the evolutionary history of genes and taxonomic groups Genomics - study of whole sets of genes and their interactions Bioinformatics - application of computational methods to storage and ana

genetic mechanisms that affect viruses. n We finish by looking at representative virus genomes to illustrate the . Because viruses are obligate intracellular parasites only able to replicate inside . genomes and to consider how

sequences (DNA, RNA, or amino acid sequences), high sequence similarity usually implies signi cant functional or structural similarity." D. Gus eld, Algorithms on strings, trees and sequences Note that the converse is not true: \ . similar sequences yield similar structures, but quite di erent sequences can produce remarkably similar structures."

Introduction to real analysis / Robert G. Bartle, Donald R. Sherbert. – 4th ed. p. cm. Includes index. ISBN 978-0-471-43331-6 (hardback) 1. Mathematical analysis. 2. Functions of real variables. I. Sherbert, Donald R., 1935- II. Title. QA300.B294 2011 515–dc22 2010045251 Printed in the United States of America 10987654321. FDED01 12/08/2010 15:42:42 Page 5 A TRIBUTE This edition is .