Lecture 18 Molecular Evolution And Phylogenetics

2y ago

4 Views

3 Downloads

2.10 MB

78 Pages

Last View : 2m ago

Last Download : 3m ago

Upload by : Julius Prosser

Report this link

Download PDF

Transcription

6.047/6.878 - Computational Biology: Genomes, Networks, EvolutionPatrick Winston’s 6.034Lecture 18Molecular Evolution and PhylogeneticsSomewhere, something went wrong 1

Challenges in Computational Biology4 Genome Assembly5 Regulatory motif discovery1 Gene FindingDNA2 Sequence alignment6 Comparative Genomics7 Evolutionary TT3 Database lookupGene expression analysisRNA transcript9 Cluster discovery 10 Gibbs sampling11 Protein network analysis12 Metabolic modelling13 Emerging network properties2

Concepts of Darwinian EvolutionSelectionImage in the public domain.Courtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 20143

Concepts of Darwinian EvolutionImage in the public domain.Charles Darwin 1859. Origin of Species [one and onlyillustration]: "descent with modification"Courtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 20144

Goals for today: Phylogenetics Basics of phylogeny: Introduction and definitions– Characters, traits, nodes, branches, lineages, topology, lengths– Gene trees, species trees, cladograms, chronograms, phylograms1. From alignments to distances: Modeling sequence evolution– Turning pairwise sequence alignment data into pairwise distances– Probabilistic models of divergence: Jukes Cantor/Kimura/hierarchy2. From distances to trees: Tree-building algorithms– Tree types: Ultrametric, Additive, General Distances– Algorithms: UPGMA, Neighbor Joining, guarantees and limitations– Optimality: Least-squared error, minimum evolution (require search)3. From alignments to trees: Alignment scoring given a tree– Parsimony: greedy (union/intersection) vs. DP (summing cost)– ML/MAP (includes back-mutations, lengths): peeling algorithm (DP)4. Tree of Life in Genomic Era– The prokaryotic problem (no real taxa and HGT)– Interpreting the forest of life6

Introduction: Basics and DefinitionsCharacters, traits, gene/species trees7

Common Phylogenetic Tree TerminologyBranches orLineagesTerminal NodesABCRepresent theTAXA (genes,populations,species, etc.)used to inferthe phylogenyDAncestral Nodeor ROOT ofthe TreeInternal Nodes orDivergence Points(represent hypotheticalancestors of the taxa)E8

Extinctions part of lifePhylogenetic tree showing archosaurs, dinosaurs, birds, etc. through geologic time removed due to copyright restrictions.9

PhylogeneticsGeneral Problem:Infer complete ancestry ofa set of ‘objects’ based onknowledge of their ‘traits’Mammal family tree removed due to copyright restrictions.‘Objects’ can be: Species,Genes, Cell types, Diseases,Cancers, Languages, Faiths,Cars, Architectural Styles‘Traits’ can be: Morphological, molecular,gene expression, TF binding, motifs, words Historical record varies: Fossils, imprints,timing of geological events, ‘living fossils’,sequencing of extinct species, paintings, stories.Today: Phylogenies using only extant species data gene trees (paralog / ortholog / homolog trees)10

Inferring Phylogenies: Traits and CharactersTrees can be inferred by several criteria:– Traditional traits: Morphology data Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.– Modern traits: Molecular AGCAAACGACCTGTGACGTAGCAAACGA11

From physiological traits to DNA characters Traditional phylogenetics– Building species trees– Small number of traits Hoofs, nails, teeth, horns– Well-behaved traits, each arose once Parsimony principle, Occam’s razor Modern phylogenetics– Building gene trees and species trees– Very large number of traits Every DNA base and every protein residue– Frequently ill-behaved traits Back-mutations are frequent (convergent evolution) Small number of letters, arise many times independently12

Three types of treesCladogramChronogramTaxon BTaxon BTaxon CTaxon CTaxon ATaxon ATaxon DTaxon DTopology onlyt1 t2 t3Topology Divergence timesPhylogram131561Taxon BTaxon CTaxon ATaxon DTopology Divergence times Divergence rates13

Inferring a tree from quence data:-Nucleotide alignments-Peptide alignmentsEvolutionary historyrepresented as abinary tree Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.14

Two basic approaches for phylogenetic inferenceDistance based12From SequencesTo DistancesSequence alignmentTree buildingalgorithmsPair-wise distance matrixOutput treeCharacter based3From alignmentsTo phylogeniesSequence alignmentCouple totree proposaland scoring4Output tree Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.15

Goals for today: Phylogenetics Basics of phylogeny: Introduction and definitions– Characters, traits, nodes, branches, lineages, topology, lengths– Gene trees, species trees, cladograms, chronograms, phylograms1. From alignments to distances: Modeling sequence evolution1– Turning pairwise sequence alignment data into pairwise distances– Probabilistic models of divergence: Jukes Cantor/Kimura/hierarchy2. From distances to trees: Tree-building algorithms2– Tree types: Ultrametric, Additive, General Distances– Algorithms: UPGMA, Neighbor Joining, guarantees and limitations– Optimality: Least-squared error, minimum evolution (require search)3. From alignments to trees: Alignment scoring given a tree3 – Parsimony: greedy (union/intersection) vs. DP (summing cost)– ML/MAP (includes back-mutations, lengths): peeling algorithm (DP)4. Tree of Life in Genomic Era4– The prokaryotic problem (no real taxa and HGT)– Interpreting the forest of life16

1. From alignments to distancesModeling evolutionary ratesDistance estimation Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.17

Measuring evolutionary rates Nucleotide divergence– Uniform rate. Overall percent identity. Transitions and transversions– Two-parameter model. A-G, C-T more frequent. Synonymous and non-synonymous substitutions– Ka/Ks rates. Amino-acid changing substitutions Nactual mutations N observed substitutions– Some fraction of “conserved” positions mutated twice.6.1A.1.1C.2GTAC.2T.6.1G18

‘Evolving’ a nucleotide under random model At time step 0, start with letter A At time step 1:.7– Remain A with probability 0.7– Change to C,G,T with prob. 0.1 each.1A.1 At time step 2:G.1.1.1– In state A with probability 0.52 Remain A with probability 0.7 * 0.7 Go back to A from C,G,T with 0.1*0.1 each.7C.1.7T.7– In states C,G,T with prob. 0.16 eacht 1t 2t 3t 4t 0.1960.2176T00.10.160.1960.217619

Modeling Nucleotide EvolutionDuring infinitesimal time t, there is not enough time for twosubstitutions to happen on the same nucleotideSo we can estimate P(x y, t), for x, y {A, C, G, T}Then letS( t) P(A A, t) P(A T, t) P(T A, t) P(T T, t)20

Modeling Nucleotide EvolutionReasonable assumption: multiplicative(implying a stationary Markov process)S(t t’) S(t)S(t’)That is, P(x y, t t’) z P(x z, t) P(z y, t’)Jukes-Cantor: constant rate of evolutionFor short time , S( ) 1 - 3 1 - 3 1 - 3 1 - 3 21

Modeling Nucleotide Evolution Jukes-Cantor:AFor longer times,r(t)S(t) s(t)s(t)s(t)s(t)r(t)s(t)s(t) s(t)s(t)r(t)s(t)s(t)s(t)s(t)r(t)Where we can derive:G C1-3 A 3 T1- otherr(t) ¼ (1 3 e-4 t)s(t) ¼ (1 – e-4 t)Geometric asymptote to 1/422

Modeling Nucleotide EvolutionKimura:Transitions: A/G, C/TTransversions: A/T, A/C, G/T, C/GTransitions (rate ) are much more likely than transversions (rate )AS(t) )r(t)s(t)Tu(t)u(t)s(t)r(t)s(t) ¼ (1 – e-4 t)u(t) ¼ (1 e-4 t – e-2( )t)r(t) 1 – 2s(t) – u(t)23

Distance between two sequencesGiven (well-aligned portion of) sequences xi, xj,Definedij distance between the two sequencesOne possible definition:dij fraction f of sites u where xi[u] xj[u]Better model (Jukes-Cantor):dij - ¾ log(1 – 4f / 3)r(t) ¼ (1 3 e-4 t)s(t) ¼ (1 – e-4 t)Observed F [ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7])ActualD [0.11, 0.23, 0.38, 0.57, 0.82, 1.21, 2.03]24

Many nucleotide models have been developedVarying levels of complexity (parameters) Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.Models also exist for peptides and codons25

2. Distance-based tree-building algorithmsMapping a distance matrix to a treeUPGMA, NJ,LSE, ME Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.27

Distance matrix Phylogenetic .z.x.cr.y.z.x.cd.c0Tree impliesa distance matrixMijdxzyDogcCathHumanmMouserRatMap distances Dijto a treemin ij (Dij-Mij)2Goal:Minimize discrepancy between observed distances and tree-based distances28

Ultrametric distances & 3 Point Condition For all points i, j, k– two distances are equal and third is smallerd(i,j) d(i,k) d(j,k)a a a b a biabjakwhere a b Result:– All paths from leaves are equidistant to the root– Rooted tree with uniform rates of evolution29

Ultrametric treesABCA B C0 6 66 0 46 4 0ABCA033B302C320For now imagine that theseSymmetric 0-diagonalare just the number of substitutions matrix of divergencebetween pairs:timesA:GCCCAACTAB:GTTTCCCTCTaken from Ran Libeskind-Hadas, Lecture Slides, Fall, 201330

Ultrametric Trees Given a symmetric n x n 0-diagonal matrix D, an ultrametric tree T forthat matrix is one in which:– There are n leaves, one for each row (column) of D– Each internal node is labeled by a time in D and has exactly twochildren– Along any path from the root to a leaf, the (divergence) times at theinternal nodes strictly decrease– For any two leaves i, j of T, the LCA of i, j is labeled with time D(i, j)3ABCA0332B302C320ABCTaken from Ran Libeskind-Hadas, Lecture Slides, Fall, 201331

Ultrametric Matrix ConstructionAA 0B 5C 2D 5E 7B50537C25057D53507E77770AA 0B 5C 2D 5E 7B50437C24057D53507E77770 Algorithms exist for “ultrametrifying” matrices.Taken from Ran Libeskind-Hadas, Lecture Slides, Fall, 201332

Minimum Spanning Tree (MST) There is a unique path between any two vertices in aspanning tree Adding an edge to a spanning tree creates a cycle Any edge on that cycle can be removed and we’ll stillhave a spanning tree MST is found using Prim’s Algorithm (graph traversal)Taken from Ran Libeskind-Hadas, Lecture Slides, Fall, 201333

The “Ultrametrification” AlgorithmGiven n x n symmetric 0-diagonal matrix D that is notultrametric1.Construct a completely connected graph with nvertices, one per row of A. The edge weight fromvertex i to vertex j is D(i, j).2.Find a minimum spanning tree (MST) of this graph.3.Build a new matrix D’ such that D’(i, j) is the largestweight on the unique path from i to j in the MST.Taken from Ran Libeskind-Hadas, Lecture Slides, Fall, 201334

Distances: (b) Additive distances All distances satisfy the four-point condition– Any quartet can be labeled i,j,k,l such that: d(i,j) d(k,l) d(i,k) d(j,l) d(i,l) d(j,k) (a b) (c d) (a m c) (b m d) (a m d) (b m c)ikamjcbdl Result:– All pairwise distances obtained by traversing a tree35

Distances: (c) General distances In practice, a distance matrix is neither ultrametric nor additive– Noise Measured distances are not exact Evolutionary model is not exact– Fluctuations Regions used to measure distances not representative of the speciestree Gene replacement (gene conversion), lateral transfer Varying rates of mutation can lead to discrepancies In the general case, tree-building algorithms must handle noisydistance matrices– Such a tree can be obtained by Enumeration and scoring of all trees (too expensive) Neighbor-Joining (typically gives a good tree) UPGMA (typically gives a poor tree)36

Algorithms: (a) UPGMA (aka Hierarchical Clustering)(Unweighted Pair Group Method with Arithmetic mean)Initialization:Assign each xi into its own cluster CiDefine one leaf per sequence, height 014Iteration:Find two clusters Ci, Cj s.t. dij is minLet Ck Ci CjDefine node connecting Ci, Cj,& place it at height dij/2Delete Ci, Cj352Termination:When two clusters i, j remain,place root at height dij/21423537

Ultrametric Distances & UPGMA14235UPGMA is guaranteed to build the correct tree if distance isultrametricProof:1.2.The tree topology is unique, given that the tree is binaryUPGMA constructs a tree obeying the pairwise distances38

Weakness of UPGMAMolecular clock assumption:implies time is constant for all speciesHowever, certain species (e.g., mouse, rat) evolve much fasterExample where UPGMA messes up:UPGMACorrect tree2134142339

Algorithms: (b) Neighbor-Joining Guaranteed to produce the correct tree if distance is additive May produce a good tree even when distance is not additiveStep 1: Finding neighboring leaves1Define0.10.10.1Dij dij – (ri rj)Where1ri ––––– k dik L - 230.420.44Claim: The above “magic trick” ensures that Dij is minimal iff i, j are neighborsProof: Beyond the scope of this lecture (Durbin book, p. 189)40

Algorithm: Neighbor-joiningInitialization:Define T to be the set of leaf nodes, one per sequenceLet L TIteration:Pick i, j s.t. Dij is minimalDefine a new node k, and set dkm ½ (dim djm – dij) for all m LAdd k to T, with edges of lengths dik ½ (dij ri – rj)Remove i, j from L;Add k to LTermination:When L consists of two nodes, i, j, and the edge between them of length dij41

Algorithms: (c) Distance-fitting algoriths With distance-based algorithms, we can also aim todirectly minimize discrepancy between originaldistance matrix and tree-based distance matrixCOMPUTATIONAL METHODDistances CharactersDATA TYPEOptimality criterion Clustering algorithmPARSIMONYMAXIMUM LIKELIHOODMINIMUM EVOLUTIONUPGMALEAST SQUARESNEIGHBOR-JOINING42

Aside: Alternative to Molecular clock?Divergence between orthologous sequences is proportional to timeseparating the species.Different genes evolve at specific, roughly constant rates.Zuckerkandl & Pauling 1962ratedistancesamplingerrordivergence timeCourtesy of Yuri Wolf; slide in the public domain.timeTaken from Yuri Wolf, Lecture Slides, Feb. 201444

Molecular ClockUnder MC all individual gene trees are ultrametric (up to a sampling error)and identical to the species tree up to a scaling factor (evolution rate).Are these really ultrametric?speciestreegene 1ABCDEFGHABCDEFGHtimegene 2distanceABCDEFGHdistanceCourtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 201445

Molecular ClockMost of the real phylogenetic trees are far from being ultrametric.Molecular clock is substantially overdispersed.rateobservedidealexpected basedon sampling errortime0.2Courtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 201446

Relaxed Molecular ClockRelaxed molecular clock models allows for rate variation.Rates are sampled from prior distributions with limited variance,independently or in autocorrelated manner.rateGenes are either analyzed individually, or as concatenated alignments(implying evolution as a single unit).timeCourtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 201447

Universal PacemakerUniversal Pacemaker model assumes that evolutionary time runs atdifferent pace in each lineage.Under the UPM, species trees are intrinsically Courtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 201448

Pacemaker vs ClockBoth overdispersed MC and UPM models predict that individual genetrees would deviate from ultrametricity.Under MC these deviations are expected to be uncorrelated.Under UPM these deviations are expected to be correlated, so thereexists a non-ultrametric pacemaker tree that can significantly reducevariance of observed rates.A testable hypothesis!2,300 trees of 100 prokaryotic species;7,000 trees of 6 Drosophila species1,000 trees of 9 yeast species5,700 trees of 8 mammalian speciesCourtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 201449

Pacemaker vs Clock2,300 trees of 100 prokaryotic species;7,000 trees of 6 Drosophila species1,000 trees of 9 yeast species5,700 trees of 8 mammalian speciesAll show an overwhelming support to UPM model.Snir 2012; work in progress at NCBI (NIH)Courtesy of Yuri Wolf; slide in the public domain.Taken from Yuri Wolf, Lecture Slides, Feb. 201450

3. Character-based tree-scoring algorithms3a: Parsimony (set-based)3b: Parsimony (Dyn. Prog.)3c: Maximum Likelihood Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.52

Basic algorithms of phylogenetic methodsDistance based12From SequencesTo DistancesSequence alignmentTree buildingalgorithmsPair-wise distance matrixOutput treeCharacter based3From alignmentsTo phylogeniesSequence alignmentCouple totree proposaland scoring4Output tree Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.53

Character-based phylogenetic inference Really about tree scoring techniques, not treefinding techniques– Couple them with tree proposal and update and youhave an algorithm (part 4 of the lecture) Two approaches exist, all use same architecture:– Minimize events: Parsimony (union/intersection)– Probabilistic: Max Likelihood / MAP54

Parsimony scoring (a): Union and intersectionGiven a tree, and an alignment columnLabel internal nodes to minimize thenumber of required substitutions{A, B}C 1Initialization:Set cost C 0; k 2N – 1{A}Iteration:If k is a leaf, set Rk { xk[u] }{A, B}If k is not a leaf,Let i, j be the daughter nodes;Set Rk Ri Rj if intersection isnonemptySet Rk Ri Rj, and C 1, ifintersection is emptyTermination:Minimal cost of tree for column u, CC 1A{A}B{B}A{A}B{B}55

Parsimony traceback to find ancestral nucleotidesTraceback:1. Choose an arbitrary nucleotide from R2N – 1 for the root2. Having chosen nucleotide r for parent k,If r Ri choose r for daughter iElse, choose arbitrary nucleotide from RiEasy to see that this traceback produces some assignment of cost CStill optimal, butnot found by tracebackAccessible to traceback{A, B}x{A}B A{B} {A}AA{A, B}A{A}BAB{B}AAAxBABABxxxBBABAxBBAB56

Parsimony Scoring (b): Dynamic 314 Each cell (N,C) represents themin cost of the subtree rooted atN, if the label at N is C. Update table by walking up thetree from the leaves to the root,remembering max choices. Traceback from root to leaves toconstruct a min cost assignmentB3B2B1AMouseGAGRatHumanDog57

Scoring (c) Maximum Likelihood & Max-a-PosterioriInput: Sequence alignmentOutput: tree with maximum likelihood / max a posteriori prob.Search: Heuristic search for max likelihood tree.Maximum Likelihood (ML)B ,T argmaxB,T P(D B,T)D seq. alignment dataB branch lengthsT topologyMaximum a Posteriori (MAP)B ,T argmaxB,T P(B,T D) argmaxB,T P(B,T,D) / P(D) argmaxB,T P(B,T,D) argmaxB,T P(D B,T)P(B,T)likelihoodlikelihoodP(D B,T) is the likelihood of data given model Computerecursively Use seq evolution model: JC,K2P,HKY.using DPP(B,T) is a prior on trees/branch lengths Use Yule process, Birth-Death process to model59

‘Peeling’ algorithm for P(D B,T) term1. Assume sites j evolve independently. Treat each column of the alignment in isolation2. Assume branch independence, conditioned on parent Expand total joint probability into prod of P(xi xparent,ti) Only P(x2n-1) remains, root prior, background nucl. freq.3. We know how to compute P(xi xparent(i),ti) for fixed pair Defined by our sequence model (JC, K2P, HKY, etc) Easily calculate for any given assignment of internal nodes4. As internal node values are not known marginalize Sum over all possible values of all internal/root nodes Let xn 1, ,x2n-1 represent seqs of n-1 internal nodes60

1. Site evolution over single branchRemember: Jukes-Cantor (JC)JC is a Continuous-TimeMarkov Chain (CTMC) Defines instantaneousrates of transition betweenstates (bases) GA CDiscrete MC version Given time t, we define a discrete MC with transitionmatrix is S(t), also called a substitution probability matrix. Gives the probability of seeing base a given initial base bafter duration time t. T“A”Use JC to define single site evolution:tP(a “C” b “A”, t) S(t)ba“C”61

2. Sequence evolution over single branch Assume site independence– P(xi xk, ti) Πj P(b xij a xkj, ti)Use product to define sequence evolution:xk “AAACTG”tiP(xi xk, ti)xi “CAAGTC”62

3. Sequence evolution over entire tree Assume branch independence– P(x1, xn, , x2n-1 T, t) P(x2n-1)Πi P(xi xparent(i), ti) Assume prior on root sequence, e.g.– P(x2n-1) P(x2n-1,j) (1/4) m for sequence length mUse product and prior to define sequence evolution over tree:x9 “AAACTG”t8x8t6t1x6t2x1x2P(x1, xn, , x2n-1 T, t)t7x7t3t4x3 x4t5x563

4. Integrate (marginalize) over hidden ancestral seqs! Notice, all sequences are needed, both internal nodes and leaves– P(x1, xn, , x2n-1 T, t) But, only leaves are given: x1, xn Therefore, need to marginalize (sum) over unknowns: xn 1, , x2n-1 This looks expensive!x9 “AAACTG”– P(x1, xn T, t) Σxn 1, , Σ x2n-1 P(x1, xn, , x2n-1 T, t)t8 Don’t worry, dynamic programmingcan do it efficiently.x8t6t1x6t2x1t7x2x7t3t4x3 x4t5x564

Basic trick to efficient marginalizationApply factorization trick to every internal node in the tree.x7t5t6x6x5t1x1t2t3x2 x3t4x4P(x1,x2,x3,x4 T, t) Σx5Σx6Σx7 P(x1,x2,x3,x4,x5,x6,x7 T, t) Σx5Σx6Σx7 P(x1 x5,t1) P(x2 x5,t1)P(x3 x6,t3) P(x4 x6,t4)P(x5 x7,t5) P(x6 x7,t6) P(x7) Σx7 P(x7)[Σx5 P(x5 x7,t5) P(x1 x5,t1) P(x2 x5,t1)][Σx6 P(x6 x7,t6) P(x3 x6,t3) P(x4 x6,t4)]Peeling algorithm L(i,j,a) is the DP table. Each entry contains the probabilityof seeing the leaf data below node i,given that node i has base a at site j. The leaves of the table are initializedbased on the observed sequence.Entries populated in post-order traversal. Runtime: O(2n * k 2)65

Use DP to compute argmax P(D B,T) efficientlyLi[a]Char a at node itleftb at jLj[b]trightc at jLk[c] If we know the branch lengths tleft & tright. And we already have the likelihood tablesLj&Lk of left and right subtrees(for each possible ending character at b, c) Fill in likelihood table Li for each char a at iLiLjLkACP(. ‘C’)GTLi[a] Σb {ACGT} Σc {ACGT} ( P(b a,tleft)*Lleft[b] * P(c a,tright)*Lright[c]Prob(a b))Prob(a c)66

Initialization and TerminationRootInternal Nodesn 321A00100C10001G00000T010102n-1 ijk Leavesn 1 Characters at the leaves are already known– Their likelihood is 1 or 0, indicating the known char Fill in internal node likelihood vectors iteratively Once we reach the root, multiply by the base freqs Maximization over Topologies and Lengths Numerical: gradient descent, Newton’s method67

Advantages/disadvantages of ML/MAP methods Advantages:– Inherently statistical and evolutionary model-based.– Usually the most ‘consistent’ of the methods available.– Used for both character and rate analyses– Can be used to infer the sequences of the extinct ancestors.– Account for branch-length effects in unbalanced trees.– Nucleotide or amino acid sequences, other types of data. Disadvantages:– Not as intuitive as parsimony (e.g. may choose more eventsif they’re more likely in our probabilistic model)– Computationally intense (Iimits num taxa, sequence length).– Like parsimony, can be fooled by high levels of homoplasy.– Violations of model assumptions can lead to incorrect trees.68

Tree reliability: Bootstrapping1. Re-sample alignments:––Randomly sample alignment columnswith replacementCreate many alignments of equal size.2. Build a phylogenetic treefor each sample3. Repeat (1) and (2) many times–1000s of times4. Output summary tree–––Tree constructed most frequentlyConsensus tree (even if not most freq)Other options5. Report observation frequencyof each branch–Each branch is a binary split Source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.69

Goals for today: Phylogenetics Basics of phylogeny: Introduction and definitions– Characters, traits, nodes, branches, lineages, topology, lengths– Gene trees, species trees, cladograms, chronograms, phylograms1. From alignments to distances: Modeling sequence evolution1– Turning pairwise sequence alignment data into pairwise distances– Probabilistic models of divergence: Jukes Cantor/Kimura/hierarchy2. From distances to trees: Tree-building algorithms2– Tree types: Ultrametric, Additive, General Distances– Algorithms: UPGMA, Neighbor Joining, guarantees and limitations– Optimality: Least-squared error, minimum evolution (require search)3. From alignments to trees: Alignment scoring given a tree3 – Parsimony: greedy (union/intersection) vs. DP (summing cost)– ML/MAP (includes back-mutations, lengths): peeling algorithm (DP)4. Tree of Life in Genomic Era4– The prokaryotic problem (no real taxa and HGT)– Interpreting the forest of life70

Tree of Life in Genomic EraGenomic era – growing frustration with discrepancies between the treesreconstructed for individual genes and heroic efforts to overcome thenoise. Role of horizontal gene transfer in the evolution of prokaryoticgenomes is established.Major lines of approach: gene repertoire and gene orderdistribution of distances between orthologsconcatenated alignments of "non-transferable" gene coresconsensus trees and supertreesCiccarelli 2006. Towards automatic reconstruction of a highlyresolved tree of li

Lecture 18 . Molecular Evolution and Phylogenetics . 6.047/6.878 - Computational Biology: Genomes, Networks,

Related Documents:

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

94 Views

2y ago

The journal Molecular Biology covers a wide range of ...

The journal Molecular Biology covers a wide range of problems related to molecular, cell, and computational biology, including genomics, proteomics, bioinformatics, molecular virology and immunology, molecular development biology, and molecular evolution. Molecular Biology publishes reviews, mini-reviews, and experimental and theoretical works .

47 Views

3y ago

LECTURE NOTES on PROGRAMMING & DATA STRUCTURE Course Code : BCS101

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

56 Views

1y ago

Molecular Modeling 1 - Truman State University

Jan 31, 2011 · the molecular geometries for each chemical species using VSEPR. Below the picture of each molecule write the name of the geometry (e. g. linear, trigonal planar, etc.). Although you do not need to name the molecular shape for molecules and ions with more than one "central atom", you should be able to indicate the molecular geometryFile Size: 890KBPage Count: 7Explore furtherLab # 13: Molecular Models Quiz- Answer Key - Mr Palermowww.mrpalermo.comAnswer key - CHEMISTRYsiprogram.weebly.comVirtual Molecular Model Kit - Vmols - CheMagicchemagic.orgMolecular Modeling 1 Chem Labchemlab.truman.eduHow to Use a Molecular Model for Learning . - Chemistry Hallchemistryhall.comRecommended to you b

24 Views

1y ago

Activated Alumina

Xiangrun's Molecular sieve Email:info@xradsorbent.com Tel:86-533-3037068 Website: www.aluminaadsorbents.com Molecular sieve Types 3A Molecular sieve 4A Molecular sieve 5A Molecular sieve 13X Molecular sieve PSA Molecular Sieve Activated zeolite powder 3A Activated zeolite powder 4A Activated zeolite powder 5A

10 Views

10m ago

MSE 460: Electronic Materials, Devices, and Processing

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

150 Views

2y ago

【E-book】Texts & Questions of 50 Lectures for TOEFL ...

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

142 Views

2y ago

Partial Differential Equations MSO-203-B - IIT Kanpur

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

35 Views

11m ago

Recent Views

IN THIS ISSUE CAR WASH INSIGHT Recent, Notable M&A Transactions .

9/8/2022 Club Car Wash Sites of Tidal Wave Express Car Wash 8 8/29/2022 Take 5 Car Wash Soft Touch Car Wash, Auto Oasis Car Wash, Clearwater Car Wash and Birdie's Car Wash 5 8/25/2022 WhiteWater Express Geaux Clean Car Wash 7 8/19/2022 ModWash Home Team Car Wash 3 8/18/2022 Splash In ECO Car Wash (Wills Group) Blue Hen Car Wash 2

8m ago

100 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

ESSENTIAL PLAN - Discovery

Car insurance only Car and home insurance Car insurance only Car and home insurance 12.5% 25% 5% 10% YOUR FUEL CASH BACK PERCENTAGE GET TO THE HIGHEST CASH BACK PERCENTAGE Add at least R250 000 of home insurance (household contents, buildings or both) Take your car to Tiger Wheel & Tyre and pass the Annual MultiPoint check

1y ago

269 Views

CAR INSURANCE EVERYTHING EXPLAINED - RSA Insurance Group

CAR INSURANCE 93013821.indd 1 15/03/2018 10:46. 2 WELCOME TO µ CAR INSURANCE Thank you for choosing µ to protect you and your car. This booklet is intended to help you check your cover and to reassure you that µ will give you the protection you need for the year ahead. First of all, to help you understand your car insurance policy we want to .

1y ago

274 Views

Describe types and purposes of insurance.

D.O. CAPS Consumer Skills: Insurance—10E 3 Your car - The car you drive can also affect your insurance rates. Insurance companies place certain kinds of cars in special risk categories. You should ask your insurance agent before making a car purchase to make sure you aren't getting a car that will cost you extra for your liability insurance.

1y ago

233 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

Contours Options Infant Car Seat Adapter Instruction Sheet

your Infant Car Seat, as described in the instruction manual provided by the Infant Car Seat manufacturer. † WHEN USING ONLY ONE INFANT CAR SEAT ADAPTER OR TWO FOR TWINS, THE FOLLOWING INFANT CAR SEATS CAN BE USED: † If your Infant Car Seat is not one of the models listed above, DO NOT use your infant car seat with this car seat adapter.

2y ago

564 Views

Microsoft Advertising Travel Update

last minute cruise deals -58.50% Car Rental Queries WoW Change car rental -43.80% rental cars -46.30% car rentals -40.60% cheap car rentals -48.00% car rentals cheapest rates -52.20% rent a car- 40.30% cheap rental cars -45.60% rental car -41.80% car rental deals -49.30% rental cars lowest price -53.90% Flight Queries WoW Change cheap flights .

1y ago

337 Views

Design and development of lift for an automatic car parking system

1. Stacker type car parking system 2. Puzzle type car parking system 3. Level type car parking system 4. Chess type car parking system 5. Rotary type car parking system 6. Tower type car parking system But lift is used only in tower type car parking system. Objectives:-

6m ago

172 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Money Online Price Comparison - WordPress

you to compare car insurance quotes. You'll notice at the top of the screen is a warning regarding telling the truth when completing any form of car insurance quote as something withheld, which later becomes known, can void an insurance claim. 7 The process of completing a car insurance price comparison is broken down into 4

1y ago

174 Views

Better car deals - Consumer Affairs Victoria

Insurance protects you against costs and liabilities if the car is stolen, vandalised or damaged in a crash. When budgeting, consider taking out at least third party car property insurance. It may be cheaper to arrange your own insurance than taking it out through the trader. Contact insurance companies to compare premiums and policy coverage.

1y ago

153 Views

Car Insurance This booklet covers:Car Rapid Bonus Business

Car Insurance This booklet covers:Car Rapid Bonus Business RAC Direct Insurance is a trading name of London and Edinburgh Insurance Company Limited. Registered in England No 924430. Registered Office: 8 Surrey Street, Norwich NR1 3NG. Member of the Aviva Group. Authorised and regulated by the Financial Services Authority. RAC052(V27)-1971-06.06 .

1y ago

218 Views

Root Insurance (ROOT) - Citron Research

Root Insurance (ROOT) Leveling the Playing Field of Car Insurance What every trader needs to know about one of the mostheavily shorted stocks in the market Traditional Credit-Based Car Insurance PerpetuatesEconomic and Racial Inequalities as one in three American cannot affordessentials because of car insurance premiums

1y ago

209 Views

Life Cycle Analysis: Uber vs. Car Ownership

(LCA) will be performed to compare ridesharing services versus car ownership. We will compare per mile average cost and CO 2 emissions . assumption of 15 years being a car's lifetime and calculated average costs for car maintenance, repairs, insurance, gas and registration. We used Economic Input Output Life Cycle Assessment .

1y ago

122 Views

Lecture 18 Molecular Evolution And Phylogenetics

It looks like you're using an ad-blocker