Tree Of LifeMutationsB-1Tree Of LifeB-2Mutations vs. Substitutions Mutations are changes in DNA Substitutions are mutationsthat evolution has toleratedWhich rate is greater?B-3Replicative proofreading and DNA repair constrain the mutation rateB-4

Selectionist EvolutionneutralWhy Are Mutations Important?beneficialMutations canbe deleteriousdeleteriousMutations driveevolution Most mutations are deleterious; removedvia negative selection Advantageous mutations positively selected Variability arises via selectionB-5UV Damage to DNAB-6Radiation ResistantUV 10 Gray will kill a human 60 Gray will kill an E. coli cultureThymine dimers Deinococcus can survive 5000 GrayWhat happens if damage is not repaired?B-7B-8

A Sequence Mutating at Random112******9 actual substitutions***p 5 observed substitutionsSequence distance123456789101112Simulating Random Mutationsactual substitutionsobserved substitutionsMultiple substitutions at one site can causeunderestimation of number of actual substitutionsSubstitutionsB-9Measuring Sequence Divergence:Why Do We Care?DNA StructureOH Inferring phylogenetic relationshipsGAC3’OHG-C: 3 hydrogen bondsA-T: 2 hydrogen bondsAT Use in sequence alignments and homologysearches of databases*3’AT5’ Dating divergence, correlating withfossil record* Comparative genomics is an important field. Determining not only how manysubstitutions exist between two sequences but how similar two sequences are.B - 11B - 10Two base types:- Purines (A, G)- Pyrimidines (T, C)CTG5’B - 12

Not All Base Substitions Are Created EqualSubstition Rates Differ Across Genomes TransitionsSplice sites Purine to purine (A G or G A) Pyrimidine to pyrimidine (C T or T C) Transversions Purine to pyrimidine (A C or T; G C or T ) Pyrimidine to purine (C A or G; T A or G)Start of transcriptionTransition rate 2x transversion ratePolyadenylation siteAlignment of 3,165 human-mouse pairsB - 13The PAM Model of Protein SequenceEvolution Empirical data-based substitutionmatrix Global alignments of 71 families ofclosely related proteins. Constructed hypotheticalevolutionary trees Built matrix of 1572 amino acidpoint accepted mutationsB - 14Original PAM Substitution MatrixDayhoff, 1978Count number of times residue i was replaced with residue jB - 15B - 16

Deriving PAM MatricesDeriving PAM MatricesCalculate mutation probabilitiesfor each possible substitutionFor each amino acid, calculateits relative mutability, i.e.,the likelihood that the aminoacid will mutate:Mi,j relative mutability xproportion of all substitutions to j by changing to i# times amino acid j mutatedmj total occurrences of amino acid jMi , j mj x Ai,j Ai,jiB - 17PAM1 Mutation Probability MatrixB - 18Deriving PAM MatricesCalculate log odds ratio to convert mutationprobability to substitution scoreSi,j 10 x log10Dayhoff, 1978B - 19(Mi,j)( )fiMutation probability(Prob. substitution from j to iis an accepted mutation)Frequency of residue i(Probability of amino acid ioccurring by chance)B - 20

Deriving PAM MatricesPAM MatrixScoring in log odds ratio:- Allows addition of scores for residues in alignmentsInterpretation of score:- Positive: non-random (accepted mutation) favored- Negative: random model favoredB - 21B - 22Using PAM Scoring MatricesBLOSUM BLOCKS Substitution MatrixPAM1: 1% difference (99% identity) Can “evolve” the mutation probability matrix bymultiplying it by itself, then take log odds ratio (PAMn PAM matrix multiplied by itself n times)Like PAM, empirical proteins substitution matrices,use log odds ratio to calculate substitution scoresLarge database: local alignments of conservedregions of distantly related proteinsGaplessalignmentblocksB - 23B - 24

BLOSUM Uses Clustering To ReduceSequence BiasBLOSUM and PAM Substitution MatricesBLOSUM 30 Cluster the most similar sequences together Reduce weight of contribution of clustered sequences BLOSUM number refers to clustering threshold used(e.g. 62% for BLOSUM 62 matrix)PAM and BLOSUM BLOSUM 62PAM 120 (66)BLOSUM 90PAM 90 (50)% identity% changeB - 26Importance of Scoring MatricesPAMSmaller set of closelyrelated proteins - shortevolutionary periodUse global alignmentMore divergent matricesextrapolatedErrors arise fromextrapolationPAM 250 (80)BLAST algorithm uses BLOSUM 62 matrixB - 25 changeBLOSUM Larger set of moredivergent proteins-longerevolutionary periodUse local alignmentEach matrix calculatedseparatelyClustering to avoid biasErrors arise fromalignment errorsB - 27 Scoring matrices appear in all analyses involvingsequence comparison The choice of matrix can strongly influence theoutcome of the analysisB - 28

