Jun Liu - Harvard University

3y ago

7 Views

3 Downloads

685.60 KB

46 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Azalea Piercy

Report this link

Download PDF

Transcription

Biological Sequence Analysisand Motif DiscoveryIntroductory Overview LectureJoint Statistical Meetings 2001, AtlantaJun LiuDepartment of StatisticsHarvard Universityhttp://www.fas.harvard.edu/ junliujliu@stat.harvard.eduBiological Sequence Analysis1

Topics to be covered Basic Biology: DNA, RNA, Protein; genetic code. Biological Sequence Analysis– Pairwise alignment --- dynamic programming Needleman-Wunsch Smith-Waterman Blast – Multiple sequence alignment Heuristic approaches The hidden Markov model– Motif finding in DNA and protein sequenceBiological Sequence Analysis2

Central ParadigmCourtesyof Doug BrutlagBiological Sequence Analysis3

Buliding Blocks of Biological Systems:nucleotides and amino acids DNA (nucleotides, 4 types): information carrier/encoder. RNA: bridge from DNA to protein. Protein (amino acids, 20 types): action molecules. Genetic code: deciphering genetic information.atgaatcgta ggggtttgaa cgctggcaat acgatgactt ctcaagcgaacattgacgac ggcagctgga aggcggtctc cgagggcgga MNRRGLNAGNTMTSQANIDDGSWKAVSEGG Biological Sequence Analysis4

DNA ical Sequence Analysis5

From DNA to ProteinBiological Sequence Analysis6

GeneticCodeBiological Sequence Analysis7

Main Resources for Data NCBI --- national center for ank maintainanceBLAST searching and serversEntrez databaseTaxonomy databaseStructure databaseBankit and SEQUIN submission softwareWeb access http://www.ncbi.nlm.nih.gov EMBL --- European equivalent of NCBI.– Web access http://www.embl-heidelberg.de– EBI --- outstation of EMBL. http://www.ebi.ac.uk/Biological Sequence Analysis8

Finding “Patterns” inBiological SequencesBiological Sequence Analysis9

Everything is related to everythingelse in a logical way .Biological Sequence Analysis10

A MotifBiological Sequence Analysis11

Brute-forceStringSearchBiological Sequence Analysis12

Boyer-Moore String SearchBiological Sequence Analysis13

Consensus in Sequences A database for protein families and patterns:– Prosite http://www.expasy.ch/prosite/Pattern: [LIVM]-[ST]-A-[STAG]-H-CInterpretation: A-x-[ST](2)-x(0,1)-V-{LI}This pattern, which must be in the Nterminal of the sequence ( '), means:Ala-any-[Ser or Thr]-[Ser or Thr](any or none)-Val-(any but Leu & Ile)Biological Sequence Analysis14

Pattern Finding Programs Prosite web program (http://www.expasy.ch/prosite/)– ScanProsite - Scan a sequence against PROSITE or a pattern againstSWISS-PROT– ProfileScan - Scan a sequence against the profile entries in PROSITE Motif web programs ( http://motif.stanford.edu/identify/)– IDENTIFY– SCAN GCG SEQWEB programs (commercial)– Stringsearch– Findpatterns– MotifsBiological Sequence Analysis15

Pairwise SequenceAlignment MethodsBiological Sequence Analysis16

The Sequence Alignment problemCATTGGiven a pair of sequences,how do we view them?TCATGCATTGA better A-TGWhat determines the choice? Gap & mismatch penaltiesBiological Sequence Analysis17

Biological Sequence Analysis18

Global: Needleman-Wunsch Algorithm Finding the optimal alignment via dynamic programming Example alignmentC A T T G TCATGCATT - - GoTC - - ATGoBiological Sequence Analysis19

Alignment Recursion F (i 1, j 1) s( xi , y j ) F (i, j ) max F (i 1, j ) γ F (i, j 1) γ E.g.,s ( x, y ) 2 I { x y } , γ 1F(i-1,j-1)F(i,j-1)F(i-1,j)F(i,j)Biological Sequence AnalysisTCATGC A0 -1 -2-1 0 -1-2 1 0-3 0 3-4 -1 3-5 -2 2T T G-3 -4 -50-12 15 44 5 620

Substitution/Scoring Matrices Pam matrices (Dayhoff et al. 1978) --- phylogeny-based.PAM1: expected number of mutation 1%PAM250 Biologicalmatrix,Sequencelog-oddsAnalysis representation21

BLO(ck)SU(bstitution)M(atrix) (Henikoff & Henikoff 1992) Derived from a set (2000) of aligned and ungapped regions from proteinfamilies; emphasizing more on chemical similarities (versus how easy it isto mutate from one residue to another). BLOSUMx is derived from the setof segments of x% identity.BLOSUM62 Matrix, log-odds representationBiological Sequence Analysis22

Gap Penalties Linear score: (g) -gd– Typically: d 8, in unit of half-bits ( 4log2) Affine score: (g) -d-(g-1)e– d: gap opening penalty; e: gap extension penalty– Typical d 12, e 2 (in unit of half-bits) Gap penalty corresponds to log-probability ofopening a gaps. For example, under the standardlinear score, P(g k) exp(-dk) 2-4kBiological Sequence Analysis23

Local Alignment:Smith-Waterman Algorithm 0 F(i 1, j 1) s(x , y ) ijF (i, j) max F(i 1, j) d F(i, j 1) dBiological Sequence Analysis24

Sequence comparison and data base searchBiological Sequence Analysis25

BLAST(Altschul et al. 1990)QTVGLMIVYDDA Create a word list from the querry;– word length 3 for protein and 12 for DNA. For each listed word, find “neighboring words” ( 50), S (W , W ' ) T For each sequence in the database, search exact matches to eachword in the set. Extend the hits in both directions until score drops below X No gap allowed; use Karlin-Altschul statistics for significance New versions ( 1.4) of BLAST gives gapped alignments. Compute Smith-Waterman for “significant” alignments BLASTP (protein), BLASTN (DNA), BLASTX (pr DNA).Biological Sequence Analysis26

BLAST 2.0 Two word hits must be found within a window of Aresidues in order to trigger extension Gapped extension from the middle of ungapped HSPs Position-specific iterative (PSI-) BLAST.– Profile constructed on the fly and iteratively refined.– Begin with a single query, profile constructed from thosesignificant hits; use the profile to do another search, and iterate theprocedure till “convergence”Biological Sequence Analysis27

A Bayesian Model for Pairwise AlignmentMissing data --- Alignment matrixA i,j 1 if residue i of sequence 1 aligns withresidue j of sequence 2, 0 otherwise.Observed data pair of sequence R(1), R(2)(1)iP( R , RΨR1 , R2( 2)j A, Θ) Θ R (1) Θ R ( 2 ) Ai , j ΨR (1) , R ( 2 )iPAM or Blosumj Ai, jiBiological Sequence Analysis 1i Ai, jj 1j28

Multiple AlignmentBiological Sequence Analysis29

ClustalW Step 2: bBuild the TreeBiological Sequence Analysis30

Biological Sequence Analysis31

However . No explicit model to guide for thealignment --- heuristic driven. Tree construction has problems. Overall, not sensitive enough for remotelyrelated sequences.Biological Sequence Analysis32

The Hidden Markov Model For given zs, ys f(ys zs, θ), and the zs follow aMarkov process with transition ps(zs zs-1, φ).y1z1y2z2y3ysz3 .zsyt .ztπ t ( z t ) p ( z1 , l , zt y1 , l , y t ; φ , θ )“The State Space Model”Biological Sequence Analysis33

What Are Hidden in Sequence Alignment? HMM Architecture: transition diagram for theunderlying Markov chain.D1D2D3D4I0I1I2I3I4m0m1m2m3m4Biological Sequence AnalysismE34

The Pathr0 r1 r2δ i # o f d e le ti o n s 0Gi 1if generated by insertionif generated by a matchr1(δ1,G1)Solid path: (0,1)Dashed path: (0,1)r3 r4 r5 al Sequence Analysisr6(δ6,G6)(2,0)(0,0)35

Pfam alignment view (http://pfam.wustl.edu/browse.shtml)Biological Sequence Analysis36

More Restricted Models(Liu et al. 99, Neuwald et al. 97) Block Motif Propagation Model: limit the totalnumber of gaps but no deletions allowed.Motif 1Motif 2Motif 3Biological Sequence AnalysisMotif 437

Representative AlignmentBiological Sequence Analysis38

Finding Repetitive PatternsBiological Sequence Analysis39

Motif Alignment Modela1 Motifa2width waklength nkThe missing data: Alignment variable: A {a1, a2, , ak} Every non-site positions follows a common multinomialwith p0 (p0,1 , , p0,20) Every position i in the motif element follows probabilitydistribution pi (pi,1 , , pi,20)Biological Sequence Analysis40

The Algorithm Initialized by choosing random starting positionsa 1( 0 ) , a 2( 0 ) ,., a K( 0 ) Iterate the following steps many times:– Randomly or systematically choose a sequence, say,sequence k, to exclude.– Carry out the predictive-updating step to update ak Stop when no more observable changes in likelihood Available Programs:– Gibbs motif sampler (http://www.wadsworth.org/res&res/bioinfo)– Bioprospector (http://bioprospector.stanford.edu)– MACAW (http://www.ncbi.nlm.nih.gov)Biological Sequence Analysis41

The PU-Stepa1a2a3ak ?1. Compute predictive frequencies of each position i in motifcij count of amino acid type j at position i.c0j count of amino acid type j in all non-site positions.qij (cij bj)/(K-1 B), B b1 bK “pseudo-counts”2. Sample from the predictive distriubtion of ak .wqi , R k ( l i )i 1q 0 , Rk ( l i )P ( a k l 1) Biological Sequence Analysis42

Using MACAWBiological Sequence Analysis43

Idea 2: Mixture modeling View the dataset as a long sequence with k motif types: Idea: partition the input sequence into segments thatcorrespond to different (unknown) motif models. It is a mixture model (unsupervised learning). Implement a predictive updating scheme.Biological Sequence Analysis44

Special Case: Bernoulli Sampler Sequence data:R r1 r2 r3 rN Indicator variable: δ1δ2δ3 . . . δΝδi 1 , if it is the start of an element 0 , if not.parameter for the motif model Likelihood: π(R, Θ, ε), ε is the prior prob for δi 1 Predictive Update:π (δk 1 [ k ] , R )ε w p i ,rk i 1 π (δk 0 [ k ] , R ) 1 ε i 1 p 0 ,rk i 1 Biological Sequence Analysis45

References (self) Durbin R. et al. (1997). Biological Sequence Analysis, CambridgeUniversity Press, London. Liu, J.S. (2001). Monte Carlo Strategies in Scientific Computing, SpringerVerlag, New York. Liu, J.S. and Lawrence, C.E. (1999). Bayesian analysis on biopolymersequences. Bioinformatics, 138-143. Liu, J.S., Neuwald, A.F., and Lawrence, C.E. (1999) . Markovianstructures in biological sequence alignments. J. Amer. Statist. Assoc., 1-15. Neuwald, A.F., Liu, J.S., Lipman, D.J., and Lawrence, C.E. (1997) .Extracting protein alignment models from the sequence database. NucleicAcids. Res. 25, 1665-1677. Liu, J.S., Neuwald, A.F., and Lawrence, C.E. (1995) . Bayesian modelsfor multiple local sequence alignment and Gibbs sampling strategies. J.Amer. Statist. Assoc. 90, 1156-1170. Lawrence, et al. (1993). Science 262, 208-214.Biological Sequence Analysis46

Biological Sequence Analysis 2 Topics to be covered Basic Biology: DNA, RNA, Protein; genetic code. Biological Sequence Analysis – Pairwise alignment --- dynamic programming Needleman-Wunsch Smith-Waterman Blast – Multiple sequence alignment Heuristic approaches The hidden Markov model – Motif finding in .

Related Documents:

4 of 9 – 2019 Kenworth T680 Atlanta, GA - rbauction.com.mx

Los Angeles, CA Jun 26 Gonzales, LA Jun 26 Salt Lake City, UT Jun 30 Atlanta, GA Jun 30 Chehalis, WA Jun 30 Denver, CO Aug 5 Tipton, CA Aug 12 Phoenix, AZ Aug 28 Las Vegas, NV Sep 30 Wasilla, AK Oct 3 Canada Thunder Bay, ON Jun 15 Montreal, QC Jun 17–18 Edmonton, AB Jun 24–26 North Battleford, SK Jun 25–26 Truro, NS Jun 29 International

16 Views

2y ago

Dear Members of the Harvard Community,

Life science graduate education at Harvard is comprised of 14 Ph.D. programs of study across four Harvard faculties—Harvard Faculty of Arts and Sciences, Harvard T. H. Chan School of Public Health, Harvard Medical School, and Harvard School of Dental Medicine. These 14 programs make up the Harvard Integrated Life Sciences (HILS).

182 Views

3y ago

What Is the Real Value of Design

nIke proCter & GAmBle StArBUCkS StArwood SteelCASe tArGet wAlt dISney wHIrlpool 5,000 Jun ’ 03 ’ 04 Jun ’ 05 Jun 06 Jun 07 Jun ’ 08 Jun 09 10 Jun ’ 11 Jun 12 Jun 13 D e C ’ 03 C 04 D e C ’ 05 D e C ’ 06 07 D e C ’ 08 D e C 09 10 D e C ’ 11 C 12 C 13 39,922.89 17,5

21 Views

2y ago

SAFE HARBOR

10 12 14 16 18 10 12 14 16 18 Jun-13 Jun-14 Jun-15 Jun-16 Jun-17 Jun-18 Jun-19 Jun-20 Jun-21 Net Effective Rent* Payback Ratio 2

16 Views

2y ago

Xavier Du Maine, Lara Roach, Perspectives - Harvard University

Sciences at Harvard University Richard A. and Susan F. Smith Campus Center 1350 Massachusetts Avenue, Suite 350 Cambridge, MA 02138 617-495-5315 gsas.harvard.edu Office of Diversity and Minority Affairs minrec@fas.harvard.edu gsas.harvard.edu/diversity Office of Admissions and Financial Aid admiss@fas.harvard.edu gsas.harvard.edu/apply

147 Views

1y ago

Blacklisted Agent List as on 30th Jun'2021

bali143144 anuradha 8-jul-1981 16-jun-2014 bali141519 avadhesh kumar 13-mar-1946 16-jun-2014 bali154669 bhagyashree bhide 22-jul-1983 16-jun-2014 bali149154 deepak kumar 18-aug-1994 16-jun-2014 bali150052 dhaval ajaykumar oza 5-dec-1983 16-jun-2014 bali53663 doulath basha katike 11-jun-1981 16-jun-2014

17 Views

2y ago

Harvard Buildings Emergency Phones Harvard University ...

Faculty of Arts and Sciences, Harvard University Class of 2018 LEGEND Harvard Buildings Emergency Phones Harvard University Police Department Designated Pathways Harvard Shuttle Bus Stops l e s R i v e r a C h r YOKE ST YMOR E DRIVE BEACON STREET OXFORD ST VENUE CAMBRIDGE STREET KIRKLAND STREET AUBURN STREET VE MEMORIAL

172 Views

3y ago

THE FIRST CENTURY OF THE AMERICAN ... - Princeton …

Harvard University Press, 1935) and Harvard College in the Seventeenth Century (Cambridge: Harvard University Press, 1936). Quotes, Founding of Harvard, 168, 449. These works are summarized in Three Centuries of Harvard (Cambridge: Harvard U

226 Views

2y ago

Recent Views

Vietnamese Insurance Market Report - Ditp

Insurance agents TOTAL INSURANCE AGENTS IN VIETNAMESE MARKET 6/2016 Until the end of June 2016, total insurance agents increased by 29.5% compared with same period last year to 437,738 agents. Prudential took the lead with 181,808 agents, followed by Bao Viet life with 94,129 agents and Dai-ichi Life with 53,811 agents. e. The number of new .

1y ago

167 Views

Attorney Registration - Certificates of Insurance Upload for Insurance .

Certificate of Insurance ("COI") upload feature Loginfor insurance agents and insurers. Summary: After self-registering for a username and password, agents and insurers will have access to a portal for the upload of Certificates of Insurance. This Guide is for: Insurance agents and insurers who are authorized to upload Certificates of Insurance

1y ago

179 Views

Insurance Act Insurance Agents Regulations - Prince Edward Island

Section 3 Insurance Act Insurance Agents Regulations Page 4 Updated August 1, 2005 t c Restricted life, accident and sickness insurance agents (3) Notwithstanding subsection (2), the Superintendent may, until July 1, 2006, issue a transitional restricted certificate of authority covering life, accident and sickness insurance to

1y ago

131 Views

Insurance Act 1978 - Bermuda Laws

INSURANCE MANAGERS, BROKERS, AGENTS, INSURANCE MARKETPLACE PROVIDERS AND SALESMEN Insurance managers, agents and insurance marketplace providers to maintain lists of insurers for which they act Insurance broker, agent, salesman or insurance marketplace provider deemed agent of insurer in cert

2y ago

280 Views

THE EFFECT OF INSURANCE AGENTS IN INSURANCE PENETRATION IN KENYA By D61 .

Insurance agents sell exclusively the products of a certain insurance company whereas insurance brokers are legally independent from insurance companies. Insurance brokers are often referred to as the insured's agent (Kogi & Maragia, 2011).

1y ago

182 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

CODE OF CONDUCT FOR LICENSED INSURANCE AGENTS - ia

insurance agents when carrying on regulated activities. Secondly, the Code of Conduct supplements the duties and obligations which licensed insurance agents owe their principals (arising from their principal-agent relationship) by providing that agents should comply with the requirements set out by their

1y ago

130 Views

All about auto insurance - Option Consommateurs

of insurance companies with which they have agreements. Insurance agents: agents work for a specific insurance company. Before you decide to do business with either a broker or an agent, check out prices, the products being proposed and the quality of the service. Buying auto insurance 4 All about auto insurance

1y ago

230 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Independent Insurance Agents & Brokers of Louisiana - IIABL

Independent Insurance Agents & Brokers of Louisiana Frequently Requested Louisiana Insurance Statutes Independent Insurance Agents & Brokers of Louisiana 9818 Bluebonnet Blvd. Baton Rouge, La 70810 (225) 819-8007 www.IIABL.com

1y ago

109 Views

Insurance and Indemnification Guidelines for State of .

the Contractor's insurance company issues the required insurance policies or endorses existing policies to match the insurance requirements of the contract. As proof of coverage, most insurance agents and brokers will provide a document called a certificate of insurance. While a certificate is evidence that the Contractor has an insurance policy,

1y ago

151 Views

SPECIAL REPORT Young Agents Survey - Insurance Journal

agents and agency owners in particular — better get ready to step up. This is good news for young professionals working in independent agencies today — those 40 years old and younger. According to Insurance Journal's Young Agents Survey 2015, 82.7 percent of young agents feel very optimistic or optimistic

1y ago

119 Views

Brokers and Agents and Health Insurance Exchanges A

National Association of Insurance Commissioners distinguishes their roles as follows: Brokers act on behalf of the consumer. They can be compensated by the consumer or receive compensation from an insurance company. Agents are loyal to an insurance company and sell, solicit, or negotiate insurance on behalf of the insurer.

1y ago

129 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Jun Liu - Harvard University

It looks like you're using an ad-blocker