Introduction To Bioinformatics - Lehigh University

2y ago

27 Views

3 Downloads

2.02 MB

43 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Nora Drum

Report this link

Download PDF

Transcription

Introductionto BioinformaticsDan LoprestiAssociate ProfessorOffice PL 404Bdal9@lehigh.eduIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 1

Motivation“Biology easily has 500 years of exciting problems to work on.”Donald Knuth (Stanford Professor & famous computer scientist)By developing techniques for analyzing sequence data and relatedstructures, we can attempt to understand molecular basis of ion to Bioinformatics LoprestiBioS 95 November 2008 Slide 2

Before We Get GoingRecall your recent lectures by Professors Marzillier and Warewho presented biological background:Professor Marzillier's lecture on Nov 7.Professor Ware's lecture on Nov 10.Today I'll focus on the related computational questions.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 3

BioinformaticsWhat is bioinformatics? Application of techniques from computerscience to problems from biology.Computer ScienceBioinformaticsBiologyWhy is it interesting?Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 4 Important problems.Massive quantities of data.Desperate need for efficient solutions.Success is rewarded.

Data ExplosionOur genetic identity isencoded in long moleculesmade up of four basic units,the nucleic acids:(1) Adenine,(2) Cytosine,(3) Guanine,(4) Thymine.To first approximation, DNA isa language over a four characteralphabet, {A, C, G, T}.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide nbankstats.html

GenomesComplete set of chromosomes that determines an organism isknown as its genome.Us?Mus ://www.nsrl.ttu.edu/tmot1/mus gle.asp?strID 324Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 6Conclusion: sizedoes not matter!(But you alreadyknew this.)

Comparative GenomicsRecall this amazing diagram from Professor Ware's lecture:How did wedecipher ources/Human ion to Bioinformatics LoprestiBioS 95 November 2008 Slide 7

Algorithms are CentralAn algorithm is a precisely-specified series of steps to solve aparticular problem of interest. Develop model(s) for task at hand. Study inherent computational complexity: Can task be phrased as an optimization problem? If so, can it be solved efficiently? Speed, memory, etc. If we can't find a good algorithm, can we prove task is “hard”? If known to be hard, is there approximation algorithm (one thatworks at least some of the time or comes close to optimal)? Conduct experimental evaluations (perhaps iterate above steps).Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 8

Sequence Nature of .htmlMacromolecules are chains of simpler molecules.In the case of proteins, these basicbuilding blocks are amino acids.In DNA and RNA, they are nucleotides.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 9

NCBI GenBankNational Center forBiotechnology Information(NCBI), which is branch ofNational Library of Medicine(NLM), which is branch ofNational Institutes of Health(NIH), maintains GenBank, aworldwide repository ofgenetic sequence data (allMassive quantities of sequence data publicly available DNAneed for good computational Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 10

Reading DNARecall Professor Marzillier's lecture:This is known as ns/sommaire2/sanger.htmhttp://www.iupui.edu/ wellsctr/MMIA/htm/animations.htmIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 11Gel electrophoresis isprocess of separating amixture of molecules in agel media by application ofan electric field.In general, DNA moleculeswith similar lengths willmigrate same distance.Make DNA fragments thatend at each base: A, C, G, T.Then run gel and read offsequence: ATCGTG .

Reading DNAOriginal sequence: TCGATAGCATCGTGTCGATAGCGCIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 12

Sequencing a GenomeMost genomes are enormous (e.g., 1010 base pairs in case ofhuman). Current sequencing technology, on the other hand, onlyallows biologists to determine 103 base pairs at a time.This leads to some very interesting problems in bioinformatics .Genetic linkage map(107 – 108 base pairs)Physical map(105 – 106 base pairs)ACTAGCTGATCGATTTAGCAGCAG.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 13Sequencing(103 – 104 base pairs)

Sequencing a GenomeGenomes can also be determined using a technique known asshotgun sequencing.Computer scientists haveplayed an important role indeveloping algorithms forassembling such data.It's kind of like puttingtogether a jigsaw puzzle withmillions of pieces (a lot ofwhich are “blue /pubbooks/bc mcampbell genomics 1/medialib/method/shotgun.htmlIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 14

Sequence rgetoriginalIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 15

Sequence AssemblyA simple model of DNA assembly is the Shortest SupersequenceProblem: given a set of sequences, find the shortest sequence Ssuch that each of original sequences appears as subsequence of S.Look for overlap between prefix ofone sequence and suffix of another:ACCGT3CGTGC21TTACIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 16--ACCGT-----CGTGCTTAC----TTACCGTGC

Sequence AssemblySketch of algorithm: Create an overlap graph in which every node represents afragment and edges indicate overlap. Determine which overlaps will be used in the final assembly:find an optimal spanning forest in overlap graph.5sW AGTATTGGCAATCZ AATCGATGU ATGCAAACCTy9w4xX CCTTTTGGY TTGGCAATCAS AATCAGGIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 17343zu

Sequence Assembly Look for paths of maximum weight: use greedy algorithm toselect edge with highest weight at every step. Selected edge must connect nodes with in- and out-degrees 1. May end up with set of paths: each corresponds to a contig.5sy9wW Y S4AGTATTGGCAATCAGGx343zIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 18uAGTATTGGCAATCTTGGCAATCAAATCAGGZ U XAATCGATGATGCAAACCTCCTTTTGGAATCGATGCAAACCTTTTGG

Sequence ComparisonWhat's the problem? Google for biologists . Given new DNA or protein sequence, biologist will want to searchdatabases of known sequences to look for anything similar. Sequence similarity can provide clues about function andevolutionary relationships. Databases such as GenBank are far too large to search manually.To search them efficiently, we need an algorithm.Shouldn't expect exact matches (so it's not really like google): Genomes aren't static: mutations, insertions, deletions. Human (and machine) error in reading sequencing gels.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 19

Genomes Aren't StaticSequence comparison must account for such i PDFs/deletion.pdfIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 20http://www.accessexcellence.org/AB/GG/nhgri PDFs/insertion.pdf

Genomes Aren't StaticDifferent kinds of mutations can arise during DNA mutation.htmIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 21

The Human FactorIn addition, errors can arise during the sequencing process:“.the error rate is generally less than 1% over the first 650bases and then rises significantly over the remaining tmlA hard-to-read gel (arrow marks location where bands of similarintensity appear in two different lanes):Machines also makemistakes, of course!http://hshgp.genome.washington.edu/teacher ction to Bioinformatics LoprestiBioS 95 November 2008 Slide 22

Sequence ComparisonWhy not just line up sequences and count matches?A G T C T A T ADifference 2A T T C T G T ADoesn't work well in case of deletions or insertions:A G T C T A T ADifference 8G T C T A T AIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 23One missing symbol at start ofsequence leads to large difference!

Sequence ComparisonInstead, we'll use a technique known as dynamic programming. Model allows three basic operations: delete a single symbol,insert a single symbol, substitute one symbol for another. Goal: given two sequences, find the shortest series ofoperations needed to transform one into the other.A G T C T A T ADelete TA G C T A T AA G C T G T AIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 24Substitute G for A

Sequence ComparisonHow can we determine optimal series of operations? Approach is to build up longer solutions from previouslycomputed shorter solutions. Say we want to compute solution at index i in first sequenceand index j in second sequence:Sequence 1ivs.Sequence 2jAssume that we already know the best way to compare:Sequence 1Sequence 2vs.iIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 25Sequence 1vs.Sequence 2Sequence 1vs.Sequence 2j

Sequence ComparisonSo, best way to do this comparison:iSequence 1vs.Sequence 2jIs best choice from following three cases:Sequence 1Sequence 1Sequence 1Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 26ivs.Sequence 2vs.Sequence 2vs.Sequence 2j insertingj deletingi substitutingjfori

Sequence ComparisonNormally, this computation builds a table of distance values:sequencetcost of inserting t0cost of deleting ssequencesεεd [i-1, j] 1d [i, j] minIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 27d [i, j-1] 1d [i-1, j-1] 0 if s[i] t[j]1 if s[i] t[j]

Sequence ComparisonBy keeping track of optimal decision, we can determine operations:Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 28

Genome RearrangementsRecall what wesaw earlier: 99% of mouse genes have homologues in human genome. 96% of mouse genes are in same relative location to one another. Mouse genome can be broken up into 300 synteny blocks which,when rearranged, yield human genome. Provides a way to think about evolutionary relationships.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 29

Reversal DistanceHuman Chromosome X12345-145Cut and reverse-3-2Cut and reverse-3-2-5-41-41Cut and reverse-35Mouse Chromosome X2Reversal distance is the minimum number of such steps needed.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 30

Interesting SidenoteEarly work on a related problem,sorting by prefix reversals, wasperformed in 1970's by ChristosPapadimitriou, a famous computerscientist now at UC Berkeley, andone “William H. Gates” .Yes, that Bill Gates .Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 31

History of Chromosome XRat Consortium, Nature, 2004Hypothesized reversalsIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 32

Waardenburg’s SyndromeMouse provides insight into human genetic disorder: Waardenburg’s syndrome ischaracterized by pigmentary dysphasia. Disease gene linked to Chromosome 2,but not clear where it was located.“Splotch” mice: A breed of mice (with splotch gene) had similar symptoms. Scientists succeeded in identifying location of gene in mice. This gave clues as to where same gene is located in humans.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 33

Building the “Tree of Life”Note: these trees are “best guesses”and certainly contain some errors!Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 34http://en.wikipedia.org/wiki/Phylogenetic ogyPages/T/Taxonomy.htmlScientists build phylogenetic trees in an attempt to understandevolutionary relationships. Reversal distance is often used here.

DNA Microarrays Allows simultaneous measurement of the level of transcriptionfor every gene in a genome (gene expression). Differential expression, changes over time. Single microarray can test 10k genes. Data obtained faster than can be processed. Want to find genes that behave similarly. A pattern discovery problem.green repressedred inducedIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 35

Using DNA Microarrays Track sample over aperiod of time to seegene expression overtime. Track two differentsamples under sameconditions to seedifference in geneexpressions.Each box represents onegene’s expression over 10 Clustering.pptIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 36

DNA MicroarraysK-means clustering is one way to organize this data: Given set of n data points and an integer k. We want to find set of k points that minimizes the mean-squareddistance from each data point to its nearest cluster center.Sketch of algorithm: Choose k initial center points randomly and cluster data. Calculate new centers for each cluster using points in cluster. Re-cluster all data using new center points. Repeat second two steps until no data points are moved from onecluster to another or some other convergence criterion is met.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 37

Clustering Microarray Data Pick k 2 centers at random. Cluster data around these centerpoints. Re-calculate centers based oncurrent clusters.From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 38

Clustering Microarray Data Re-cluster data around newcenter points. Repeat last two steps until nomore data points are moved intoa different cluster.From “Data Analysis Tools for DNA Microarrays” by Sorin Draghici.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 39

Example of Hierarchical ClusteringDifferent genes that express similarlyFrom “Cluster analysis and display of genome-wide expression patterns” by Eisen, Spellman, Brown, and Botstein, Proc.Natl. Acad. Sci. USA, Vol. 95, pp. 14863–14868, December 1998Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 40

Why Study Bioinformatics? Still many urgent open problems lots of opportunities tomake fundamental contributions (and become rich and famous). Stretch your creativity and problem-solving skills to the limit. Join a cross-disciplinary team – work with interesting people. Participate in unlocking the mysteries of life itself. Make the world a better place.Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 41

CSE Course in BioinformaticsIn CSE 308, we cover:CSE 308 is not a Intro to molecular biology & algorithms, programming course! Basic programming for bioinformatics, Genetic sequence comparison & alignment, Physical mapping, sequencing, and assembly of DNA, Standard formats and sources for genomic data, Advanced topics: DNA microarrays, genome rearrangements,RNA and protein structure prediction, etc.Materials @ http://www.cse.lehigh.edu/ lopresti/courses.htmlQuestions: dal9@lehigh.eduIntroduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 42

Thank you!Introduction to Bioinformatics LoprestiBioS 95 November 2008 Slide 43

Introduction to Bioinformatics Lopresti BioS 95 November 2008 Slide 13 Sequencing a Genome Most genomes are enormous (e.g., 1010 base pairs in case of human). Current sequencing technology, on the other hand, only allows biologists to determine 103 base pairs at a time. This leads to some very interesting problems in bioinformatics .

Related Documents:

Head Start/Early Head Start of the Lehigh Valley 9/ Couity Resource Book

Lehigh County Drug and Alcohol 71, 74 Lehigh County Family Court 128 Lehigh County Juvenile Probation 128 Lehigh County Conference of Churches 65 Lehigh County Housing Authority and Valley Housing Development Corporation 57 Lehigh County Information and Referral 41 Lehigh County Office of Children and Youth 29 .

15 Views

1y ago

Head Start/ Early Head Start of the Lehigh Valley 1520 Hanover Avenue ...

Lehigh Valley Drug and Alcohol Intake Units 25 Lehigh Valley Eye Center 65 Lehigh Valley Eye Center and Children's Eye Care 58 Lehigh Valley Family Health Center 49 Lehigh Valley Hospital Center 40 Lehigh Valley Hospital Center for Women's Medicine 58 Lehigh Valley Hospital Center for Women's Health at Casa - "Viva Nueva" Clinic 58 .

23 Views

1y ago

2015-16 Resource Book - Northampton County, Pennsylvania

Lehigh Valley Drug and Alcohol Intake Units 46 Lehigh Valley Eye Center - Bethlehem 98 Lehigh Valley Eye Center and Children's Eye Care - Allentown 90 Lehigh Valley Family Health Center 81 Lehigh Valley Health Network 25 Lehigh Valley Hospital Center 71 Lehigh Valley Hospital Center for Women's Health at Casa - "Viva Nueva" Clinic 91

20 Views

1y ago

Structural Bioinformatics - Lehigh University

Structural bioinformatics adds scale and precision Structural Bioinformatics Structure Prediction Integrative Methods Molecular Simulation Structure Alignment Functional Site Comparison Docking . Lehigh University BioS 10: BioSciences in the 21st Century Brian Y. Chen Many computational fields support Structural Bioinformatics Structural

11 Views

1y ago

Bioinformatics Crash Course

Bioinformatics Crash Course Ian Misner Ph.D. Bioinformatics Coordinator UMD Bioinformatics Core . Bioinformatics!Core The Plan Monday – Introductions – Linux and Python Hands-on Training Tuesday – NGS Introduction – RNAseq with Sailfish (Dr. Steve Mount, CBCB) – RNAse

35 Views

2y ago

113 chambers of commerce from 46 states over 33 …

Greater Bath Area Chamber of Commerce, Bath, PA Greater Lehigh Valley Chamber of Commerce, Lehigh Valley, PA Greater Northern Lehigh Chamber of Commerce, Lehigh Valley, PA Greater Pittsburgh Chamber of Commerce, Pittsburgh, PA He

39 Views

2y ago

Civil Court Rules - Lehigh County Court of Common Pleas

COURT OF COMMON PLEAS OF LEHIGH COUNTY . Rule 51 Title and Citation of Rules. All civil rules of procedure adopted by the Court of Common Pleas of Lehigh County shall be cited as Lehigh Rules of Civil Procedure ("Leh.R.C.P.") Rule 52 Effective Dates of Rules. (a) A rule or amendment to a rule shall become effective upon the date specified by

38 Views

1y ago

API Standard 2000 - Venting Atmospheric and Low-Pressure ...

It is not strictly a storage tank and it is not a pressure vessel. API does not have a standard relating to venting capacities for low-pressure process vessels. It is recommended that engineering calculations be performed based on process and atmospheric conditions in order to determine the proper sizing of relief devices for these type vessels. However, it has been noted that may people do .

364 Views

3y ago

Recent Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

685 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

576 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

815 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

630 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

531 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

460 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

412 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

263 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

275 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

187 Views

List of Insurance Companies by Insurance Manager - Cayman Islands dollar

2447 Batan Insurance Company SPC, Ltd. 29-Sep-03 1307714 BBG Insurance Services, Ltd. 09-Aug-16 1254 BCHS Insurance, Ltd. 07-Oct-98 1168 Bearacuda Re 01-Aug-97 2639 Bedrock Insurance Limited 24-Nov-05 2150 Bom Ambiente Insurance Company 14-Jun-00 2565 Boundless Insurance Company, Ltd. 01-Dec-04 769 Bucap Limited 03-Mar-89

1y ago

301 Views

Insurance Certificate 713705-3 and Assistance Program

Name of insurance product: Purchase Protection and Travel Insurance for National Bank of Canada Mastercard credit cards, group insurance policy no. 713705 (Schedule A Certificate number 3)/713705-3 Type of insurance product: Purchase insurance and extended warranty and travel insurance (group insurance) Assistance provider contact information

4m ago

59 Views

Oracle Insurance Performance Insight for General Insurance

for General Insurance Overview Oracle Insurance Performance Insight for General Insurance (OIPIGI) is a comprehensive business intelligence system created exclusively for the General Insurance/Property and Casualty (P&C) insurance industry. OIPIGI provides a complete set of web-based analytical and reporting components that enable users to

1y ago

181 Views

S OF GENERAL INSURANCE

General Insurance comprises of insurance of property against fire, burglary etc, personal insurance such as Accident and Health Insurance, and liability insurance which covers legal liabilities. Suitable general Insurance covers are necessary for every family. It is important to protect one’s property, which

3y ago

286 Views

Insurance Act 1978 - Bermuda Laws

INSURANCE MANAGERS, BROKERS, AGENTS, INSURANCE MARKETPLACE PROVIDERS AND SALESMEN Insurance managers, agents and insurance marketplace providers to maintain lists of insurers for which they act Insurance broker, agent, salesman or insurance marketplace provider deemed agent of insurer in cert

2y ago

288 Views

Introduction To Bioinformatics - Lehigh University

It looks like you're using an ad-blocker