Sequence Pattern Lecture 7: Sequence Motif Discovery - Otago

1y ago

19 Views

2 Downloads

524.96 KB

5 Pages

Last View : Today

Last Download : 3m ago

Upload by : Julia Hutchens

Report this link

Download PDF

Transcription

Sequence motif: definitionsCOSC 348: Computing for Bioinformatics In Bioinformatics, a sequence motif is a nucleotide oramino-acid sequence pattern that is widespread and hasbeen proven or assumed to have a biological significance.Lecture 7:Sequence Motif Discovery Once we know the sequence pattern of the motif, then wecan use the search methods to find it in the sequences (i.e.Boyer-Moore algorithm, Rabin-Karp, suffix trees, etc.)Lubica Benuskova The problem is to discover the motifs, i.e. what is the orderof letters the particular motif is comprised of.http://www.cs.otago.ac.nz/cosc348/12Sequence motif: notationsExamples of motifs in DNA The TATA promoter sequence is an example of a highlyconserved DNA sequence motif found in eukaryotes. Another example of motifs: binding sites for transcriptionfactors (TF) near promoter regions of genes, etc. Another notation: each ‘.’ signifies any single AA, and each ‘*’indicates one member of a closely-related AA family:Gene 1Gene 2Gene 3Gene 4Gene 5Binding sites for TF An example of a motif in a protein: N, followed by anything but P,followed by either S or T, followed by anything but P One convention is to write N{P}[ST]{P} where {X} meansany amino acid except X; and [XYZ] means either X or Y or Z. WDIND*.*P.*.D.F.*W***.**.IYS**.A.*H*S*WAMRN In the 1st assignment we have motifs like A?CG, where thewildcard ? Stands for any of A,U,C,G.34Sequence motif discovery from conservation Sequence motifs are conserved sequences of similar or identicalpatterns that may occur within nucleic acids (DNA, RNA) orproteins either– within different molecules produced by the same organism or– within molecules from multiple species of organisms In the case of cross-species conservation, conserved motifindicates that a particular sequence pattern may have beenconserved during evolution to perform certain function, thus¾ Motif conservation is the basis of motif discovery by studyingsimilar genes (or proteins) in different species;¾ A motif discovery program that considers phylogeneticconservation is named PhyloGibbs.5Motif discovery based on alignment profile analysis is another word for this. This is usually done by– first constructing a local alignment of multiple sequences,– after which the highly conserved regions are isolated, based ontheir high alignment scoringProteinIDConservedregion6

Motif discovery based on alignmentSequence logo and consensus sequence After the highly conserved regions are isolated, they are used toconstruct profile matrices for each conserved region. We can extract the so-called consensus sequence, i.e. the stringof most frequent letters: The profile matrix for a given motif contains frequency countsfor each letter at each position of the isolated conserved region. A graphical representation of the consensus sequence is called asequence logo: The height of different letters at the same position isproportional to their frequency in motifs: the better the baseconservation is at that position, the higher the letters will be.78Sequence logo and information theoryShannon’s information Sequence logo is often displayed using the information theory,i.e. Shannon’s information (in bits) is a measure of the information content I(xi)associated with the particular outcome xi of a random variable X, which canhave n values/outcomes, i.e. x1, x2, xn I(xi) log2 P(xi) log2 (1 / P(xi))– When choosing from 4 nucleotides, the probability P(xi) 1/4. Whenparticular nucleotide occurred, the amount of information is– I(‘nucleotide') log2 (1/(1/4)) log2 (4) 2 bits. If the possible values xi of variable X have probabilities P(xi)then Shannon’s expected information (entropy) is:nnH P( xi ) log2 P( xi )9i 1Discovering motifs without alignmentH P( xi ) log2 P( xi )Scoring motifs First, let’s assume we know where the motif starts in the referencesequence set. The motif start positions in sequences can be represented as theset s (s1, s2, s3, , st) where si is the position index Given s (s1, st), wealign t motifs of length lfrom all sequences Construct profile matrixla G g t a c T tC c A t a c g ta c g t T A g ta c g t C c A tC c g t a c g GA 3 0 1 0 3 1 1 0C 2 4 0 0 1 4 0 0lG 0 1 4 0 0 0 3 1[]count(k,i) maxT 0 0 0 5 1 0 1 4i 1 k { A ,T ,C ,G }Motif Score 3 4 4 5 3 4 3 4 3012 Motif Score(s) 1110i 1t

Let’s start againMotif finding problem Given a list of t sequences each of length n, find the “best”pattern of length l that appears in each of the t sequences. The problem is to find the starting positions s (s1, st) tomaximize the Score(s) of the resulting profile matrix. Let s (s1,.,st) be the set of starting positions for l-mers inour t sequences. Several kinds of profile matrices are used: A position frequency matrix (PFM) records the positiondependent frequency f of each letter, i.e. how many times a letteroccurs at a given position in N sequences. The strings corresponding to these starting positions will form:- 4 x l profile matrix* P for DNA- and 20 x l profile matrix* P for proteins A position probability matrix (PPM). When normalized to 1this frequency turns into a probability, i.e. P f / N. A position weight matrix (PWM): β { A ,C ,G ,T }f β k logProfile P Construct the profile P0.60.40.00.0qβ130.00.80.20.014Probability of l-mersla G g t a c T tC c A t a c g ta c g t T A g ta c g t C c A tC c g t a c g G Given s (s1, st), wealign t l-mers from allsequences andACGT* The profile matrix will be defined in terms of the probabilityof letters, and not as the count of letters.f 00.20.00.60.20.00.00.20.8 Pr(a P) is defined as the probability that an l-mer a wascreated by the Profile P.t If a is very similar to the consensus string (i.e. motif)then Pr(a P) will be high. If Pai,k is the probability of letter ai at position k, thenthe probablity of an l-mer a is equal to the product ofindividual probabilities Pai,k , i.e.:lPr( a P) Pai ,kk 115Scoring l-mers with a profile16Scoring l-mers with a profile (cont’d)Given a profile P Given a profile P 1/41/8G1/401/83/81/41/8Pr(aaacct P) ?0Pr(aaacct P) 1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8 .0336461718

Scoring l-mers with a profile (cont’d)Motif – the P-most probable l-mer Define the P-most probable l-mer from a sequence as anl-mer in that sequence which has the highest probabilityof being created from the profile P.Given a profile: P 1/83/81/41/8 Task: given a sequence ctataaaccttacatc and theknown profile P, find the P-most probable 6-mer:Prob(aaacct P) 1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8 .033646P Prob(atacag P) 1/2 x 1/8 x 3/8 x 5/8 x 1/8 x 1/8 /8G1/401/83/81/41/81920P-most probable l-mer (cont’d)P-most probable l-mer (cont’d)A1/27/83/801/80Compute Pr(a P) for every possible 6-mer:C1/801/25/83/80Window, Highlighted RedCalculationsT1/81/8001/47/8ctataaaccttacat1/8 x 1/8 x 3/8 x 0 x 1/8 x 00G1/401/83/81/41/8ctataaaccttacat1/2 x 7/8 x 0 x 0 x 1/8 x 000First try: c t a t a a a c c t t a c a t cSecond try: c t a t a a a c c t t a c a t cThird try: c t a t a a a c c t t a c a t cSlide the window to evaluate every possible 6-mer– brute force approachPr(a P)ctataaaccttacat1/2 x 1/8 x 3/8 x 0 x 1/8 x 0ctataaaccttacat1/8 x 7/8 x 3/8 x 0 x 3/8 x 00ctataaaccttacat1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8.0336ctataaaccttacat1/2 x 7/8 x 1/2 x 5/8 x 1/4 x 7/8.0299ctataaaccttacat1/2 x 0 x 1/2 x 0 1/4 x 00ctataaaccttacat1/8 x 0 x 0 x 0 x 0 x 1/8 x 00ctataaaccttacat1/8 x 1/8 x 0 x 0 x 3/8 x 00ctataaaccttacat1/8 x 1/8 x 3/8 x 5/8 x 1/8 x 7/8.00042122Dealing with zeroes and small probabiltiesP-most probable l-mer (cont’d) In our toy example Pr(a P) 0 in many cases. In practice, there willbe enough sequences so that the number of elements in the profilewith a frequency of zero is likely to be small but still we must ensurezeroes are taken care of.P-most probable 6-mer in the sequence is aaacct:Window, Highlighted RedCalculationsctataaaccttacat1/8 x 1/8 x 3/8 x 0 x 1/8 x 0Pr(a P)0ctataaaccttacat1/2 x 7/8 x 0 x 0 x 1/8 x 00ctataaaccttacat1/2 x 1/8 x 3/8 x 0 x 1/8 x 00ctataaaccttacat1/8 x 7/8 x 3/8 x 0 x 3/8 x 00ctataaaccttacat1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8.0336ctataaaccttacat1/2 x 7/8 x 1/2 x 5/8 x 1/4 x 7/8.02990ctataaaccttacat1/2 x 0 x 1/2 x 0 1/4 x 0ctataaaccttacat1/8 x 0 x 0 x 0 x 0 x 1/8 x 00ctataaaccttacat1/8 x 1/8 x 0 x 0 x 3/8 x 00ctataaaccttacat1/8 x 1/8 x 3/8 x 5/8 x 1/8 x 7/8.0004 There exist several techniques to equate zero to a very small numberso that one zero does not make the entire probability of a string zero.– The simplest one is to replace 0 with a small number, e.g. 1 / 10n.23 Another problem is that the product of small probabilities is a verysmall number. Thus, we replace the product with the sum oflogarithms:llPr( a P) Pai ,k log Pr( a P) log(Pai ,k )k 1k 124

Finding the profile P iterativelyP-most probable l-mers are motifsctataaacgttacatc Task: Find the P-mostprobable l-mer in eachof the sequences givenprofile t The P-most probable lmer is our tcacctccaaatcctttaca How do we find P?ggtcatcctttatcct25Finding the profile P ccactcacctccaaatcctttacaggtctacctttatcct Let l 6. Start at random positions (underlined)and calculate the initial profile P.26Comparing new and old profilesUse this initial profile tofind the P-mostprobable l-mer in eachsequence, and set thenew starting positionsaccording to thebeginnings of P-mostprobable l-mer in eachsequence. P-most probable l-mers form a new profile byrecalculating probabilities at all the 47/8G1/401/83/81/41/8 According to this new profile P, find the most probablel-mers in each sequence, set new positions and re-iterate.2827Algorithm for greedy profile motif searchSummary of greedy motif discoveryUse P-most probable l-mers to adjust new start positions until we reachthe “best” profile; this will be pronounced as the motif.Select random starting positions, then: Since we choose starting positions randomly, there is little chancethat our guess will be close to an optimal motif, meaning it willtake a very long time to find the optimal motif. In practice, this algorithm is run many times with the hope thatrandom starting positions will be close to the optimum solutionsimply by chance.1. Create a profile P from the l-mers at these starting positions.2. Find the P-most probable l-mer a in each sequence and change thestarting positions to the starting positions of a’s.3. Go to step 1 and re-iterate until we cannot increase the score anymore.29 The algorithm may be improved by heuristic knowledge, whereapproximately we should start or by more sophisticated statisticaltechniques, like Gibbs sampling that estimates the most probablestart positions for motifs, where we ought to start our iterativeprocess of motif discovery.30

The profile matrixfor a given motif contains frequency counts for each letter at each position of the isolated conserved region. 8 Sequence logo and consensus sequence We can extract the so-called consensus sequence, i.e. the string of most frequent letters: A graphical representation of the consensus sequence is called a sequence logo:

Related Documents:

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

100 Views

2y ago

LECTURE NOTES on PROGRAMMING & DATA STRUCTURE Course Code : BCS101

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

59 Views

1y ago

MSE 460: Electronic Materials, Devices, and Processing

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

154 Views

2y ago

【E-book】Texts & Questions of 50 Lectures for TOEFL ...

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

149 Views

2y ago

Partial Differential Equations MSO-203-B - IIT Kanpur

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

38 Views

11m ago

CBSE NCERT Solutions for Class 6 Mathematics Chapter 1

Class- VI-CBSE-Mathematics Knowing Our Numbers Practice more on Knowing Our Numbers Page - 4 www.embibe.com Total tickets sold ̅ ̅ ̅̅̅7̅̅,707̅̅̅̅̅ ̅ Therefore, 7,707 tickets were sold on all the four days. 2. Shekhar is a famous cricket player. He has so far scored 6980 runs in test matches.

165 Views

3y ago

Introduction to Quantum Field Theory for Mathematicians

Introduction to Quantum Field Theory for Mathematicians Lecture notes for Math 273, Stanford, Fall 2018 Sourav Chatterjee (Based on a forthcoming textbook by Michel Talagrand) Contents Lecture 1. Introduction 1 Lecture 2. The postulates of quantum mechanics 5 Lecture 3. Position and momentum operators 9 Lecture 4. Time evolution 13 Lecture 5. Many particle states 19 Lecture 6. Bosonic Fock .

218 Views

3y ago

MEDICAL ASTROLOGY - Donuts

astrology has nowhere to go but "up" as an undeniable diagnostic tool that will pinpoint areas to be tested. In doing so, it saves the patient pain and time—not to mention, money. '.it-. In the hospitals of Leningrad, Russia, the MST software is being utilized by physicians to help in patient diagnosis—especially in the areas of undiagnosible or hard-to-diagnose patients. My hope is that .

256 Views

3y ago

Recent Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

MSN Stock Quotes Web Part - Amrein Engineering

The MSN Stock Quotes Web Part uses the public MSN Money Central Stock Quote Web Service to display selected stock quote information. The data are delayed by 20 minutes and provided by MSN Mo

2y ago

242 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

TRAINING - CamInstructor

Mastercam Training Guide Mill-Lesson-4-9 6. Change the parameters to match the Stock Setup screenshot below: Stock Setup Stock Origin The stock origin is the X-Y-Z coordinate position of the point indicated by the cross in the picture of the stock model. Use it so Mastercam knows where your stock model is located relative to your part and

3y ago

242 Views

WPX Energy, Inc. - Feltl and Company

WPX Energy, Inc. Common Stock We are offering 27,000,000 shares of our common stock. Our common stock is listed on the New York Stock Exchange under the symbol “WPX.” On July 10, 2015, the last reported sale price for our common stock on the New York Stock Exchange (the “NYSE”) was 11.22 per share.

3y ago

172 Views

Spray 2020 Corporate Profiles - industry-publications

Custom plastic tubes (mono & multi-layer, ABL and Polyami) Stock and custom plastic, metal, and wood caps and closures Stock and custom fine mist, treatment and lotion pumps Stock and custom droppers Stock and custom rollerballs/roll-ons Stock sampler bottles and vials Stock German Quality cosmetic pencil sharpeners

2y ago

180 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

181 Views

The Impact of Persian News on Stock Returns Through Text Mining Techniques

Persian news - on the stock prices has been neglected. Consequently, this study aimed to fill this gap. To this aim, the stock index values were collected from the Tehran Stock Exchange along with the . Stock market prediction is a way to understand the future fluctuations of a company's stock price (Jishag et al., 2020). Generally, two .

1y ago

225 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

158 Views

Operation of Stock Exchange - Williams College

Class Notes Operation of Stock Exchange - 3 - Buying on Margin "Margin" is borrowing money from your broker to buy a stock and using your invest-ment as collateral. Example Buy paying full price Buy stock at 60. Stock price goes to 90. Return (90 - 60)/60 50% Buy on "margin" Buy stock at 60. Borrow 30; you pay 30.

1y ago

138 Views

Sequence Pattern Lecture 7: Sequence Motif Discovery - Otago

It looks like you're using an ad-blocker