Joanne@msl.ubc

2y ago
9 Views
3 Downloads
6.35 MB
118 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Matteo Vollmer
Transcription

joanne@msl.ubc.caBioinformaticsCommon tools, useful databases, and tricks of the trade.1bioteach.ubc.ca/bioinfo2008

Workshop Schedule Laptops, available here foryour use 9am - 4:30pm wireless loginmslguest4myguest Vancouver guide booksavailable2

Today’s Topics BLAST - Finding Function by Sequence Similarity GUIDED TOUR - Advanced Tips & Tricks for UsingBLAST PRACTICAL EXERCISES - The Jurassic ParkDetective Story Genome Browsers - Accessing Genome PRACTICAL EXERCISES - Three different viewsof the BRCA1 geneAnnotations3

BLASTFinding Function By Sequence Similarity4

Concepts of SequenceSimilarity Searching The premise:One sequence by itself is not informative; itmust be analyzed by comparative methodsagainst existing sequence databases todevelop hypothesis concerning relatives andfunction.5

The BLAST algorithm The BLAST programs (Basic Local AlignmentSearch Tools) are a set of sequencecomparison algorithms introduced in 1990that are used to search sequence databasesfor optimal local alignments to a query. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) “Basic localalignment search tool.” J. Mol. Biol. 215:403-410. Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ(1997) “Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs.” NAR 25:3389-3402.6

Submit QueryRequest ResultsBLASTWeb PageBLASTserverReturn Formatted ResultsDisplay Resultsfetch ASN.1ASN.17fetch sequenceBLASTdb

What BLAST tells you . BLAST reports surprising alignments-Different than chanceAssumptions-Random sequencesConstant compositionConclusions-Surprising similarities imply evolutionary homologyEvolutionary Homology: descent from a common ancestorDoes not always imply similar function8

Basic Local AlignmentSearch Tool Widely used similarity search tool Finds best local alignmentsHeuristic approach based on Smith WatermanalgorithmProvides statistical significancewww, standalone, and network clients9

BLAST programsblastpCompares an amino acid query sequence against a proteinsequence database.blastnCompares a nucleotide query sequence against a nucleotidesequence database.blastxCompares a nucleotide query sequence translated in all readingframes against a protein sequence database.You could use thisoption to find potential translation products of an unknownnucleotide sequence.tblastnCompares a protein query sequence against a nucleotidesequence database dynamically translated in all reading frames.tblastxCompares the six-frame translations of a nucleotide querysequence against the six-frame translations of a nucleotidesequence database.10

more BLAST programsMegablastPositionSpecificContiguousNearly identical sequencesDiscontiguousCross-species comparisonPSI-BLASTAutomatically generates a positionspecific score matrix (PSSM)RPS-BLASTSearches a database of PSI-BLASTPSSMsnucleotide onlyprotein only11

BLAST Algorithm Scoring of matches done using scoringmatrices Sequences are split into words (default n 3) Speed, computational efficiency BLAST algorithm extends the initial “seed” hitinto an HSP HSP high scoring segment pair Local optimalalignment12

Sequence Similarity Searching –The statistics are importantDiscriminating between real and artifactual matches isdone using an estimate of probability that the match mightoccur by chance.We’ll talk more about the meaning of the scores (S) and evalues (E) that are associated with BLAST hits13

Where does the score(S) come from? The quality of each pair-wise alignment isrepresented as a score and the scores areranked. Scoring matrices are used to calculate thescore of the alignment base by base (DNA)or amino acid by amino acid (protein). The alignment score will be the sum of thescores for each position.14

What’s a scoring matrix? Substitution matrices are usedfor amino acid alignments. each possible residuesubstitution is given a scoreA simpler unitary matrix isused for DNA pairs ( 1 formatch, -2 mismatch)615

16

BLOSUM vs PAMBLOSUM 45PAM 250BLOSUM 62PAM 160More Divergent BLOSUM 90PAM 100Less DivergentBLOSUM 62 is the default matrix in BLAST2.0. Though it is tailored for comparisons ofmoderately distant proteins, it performs wellin detecting closer relationships. A search fordistant relatives may be more sensitive with adifferent matrix.17

What do the Score andthe e-value really mean? The quality of the alignment is represented bythe Score (S).The score of an alignment is calculated as the sum of substitution and gap scores.Substitution scores are given by a look-up table (PAM, BLOSUM) whereas gapscores are assigned empirically . The significance of each alignment iscomputed as an E value (E).Expectation value. The number of different alignments with scores equivalent to orbetter than S that are expected to occur in a database search by chance. The lowerthe E value, the more significant the score.18

Notes on E-values Low E-values suggest that sequences arehomologous Can’t show non-homology Statistical significance depends on both thesize of the alignments and the size of thesequence database Important consideration for comparing results acrossdifferent searches E-value increases as database gets bigger E-value decreases as alignments get longer19

Homology: SomeGuidelines Similarity can be indicative of homology Low complexity regions can be highly similarwithout being homologous Homologous sequences not always highlysimilarGenerally, if two sequences are significantlysimilar over entire length they are likelyhomologous20

Suggested lays Source: Chapter 11 – Bioinformatics: APractical Guide to the Analysis of Genes andProteins For nucleotide based searches, one shouldlook for hits with E-values of 10-6 or less andsequence identity of 70% or more For protein based searches, one should lookfor hits with E-values of 10-3 or less andsequence identity of 25% or more21

BLAST Algorithm Scoring of matches done using scoringmatrices Sequences are split into words (default n 3) -Speed, computational efficiencyBLAST algorithm extends the initial “seed” hitinto an HSP-HSP high scoring segment pair Local optimalalignment22

How Does BLAST ReallyWork? The BLAST programs improved the overallspeed of searches while retaining goodsensitivity (important as databases continueto grow) by breaking the query and databasesequences into fragments ("words"), andinitially seeking matches between fragments. Word hits are then extended in eitherdirection in an attempt to generate analignment with a score exceeding thethreshold of "S".23

BLAST Algorithm24

How Does BLAST ReallyWork? The BLAST programs improved the overallspeed of searches while retaining goodsensitivity (important as databases continueto grow) by breaking the query and databasesequences into fragments ("words"), andinitially seeking matches between fragments. Word hits are then extended in eitherdirection in an attempt to generate analignment with a score exceeding thethreshold of "S".25

BLAST Algorithm26

Extending the High ScoringSegment Pair (HSP)MinimumScore (S)NeighborhoodScore Threshold (T)27

28

BLAST Algorithm Scoring of matches done using scoringmatrices Sequences are split into words (default n 3) -Speed, computational efficiencyBLAST algorithm extends the initial “seed” hitinto an HSP-HSP high scoring segment pair Local optimalalignment29

Credits Materials for this presentation have beenadapted from the following sources:NCBI HelpDesk - Field Guide Course MaterialsBioinformatics: A practical guide to the analysis of genesand proteins Questions? Please contact:Dr. Joanne FoxMichael Smith Laboratoriesjoanne@msl.ubc.ca30

31

BLASTGUIDED TOUR: Advanced Tips & Tricks for Using BLAST32

http://www.ncbi.nlm.nih.gov/BLAST/33

New BLAST homepage34

Submit QueryRequest ResultsBLASTWeb PageBLASTserverReturn Formatted ResultsDisplay Resultsfetch ASN.1ASN.135fetch sequenceBLASTdb

Consider your researchquestion . Are you looking for an particular gene in aparticular species? Are you looking for additional members of aprotein family across all species? Are you looking to annotate genes in yourspecies of interest?36

Know your reagents Changing your choice of database is changingyour search space Database size affects the BLAST statistics Databases change rapidly and are updatedfrequently37

Protein Databases: nr nr (non-redundant protein sequences)-GenBank CDS translationsNP RefSeqsOutside ProteinPIR, Swiss-Prot, PRFPDB (sequences from structures)-pat protein patentsenv nr environmental samples38Servicesblastpblastx

Nucleotide Databases:Human and Mouse Human and mouse genomic transcript defaultSeparate sections in output for mRNA and genomicDirect links to Map Viewer for genomic sequencesMegablast, blastn service39

Nucleotide 0

Nucleotide Databases:Traditional nr (nt)-Traditional GenBank NM and XM RefSeqs refseq rna refseq genomic-NC RefSeqs dbest-EST Divisionhtgs-HTG divisiongss-GSS divisionwgs-whole genome shotgunenv nt-environmental samples est human, mouse, othersDatabases are mostly non-overlapping41

http://www.ncbi.nlm.nih.gov/BLAST/Program SelectionGuide42

43

44

45

46

23157147

Context Specific Help48

Limiting Database:OrganismOrganism autocomplete49

Limiting Database: EntrezQueryall[filter] NOT mammals[organism]gene in mitochondrion[Properties]2006:2007 [Modification Date]Nucleotidebiomol mrna[Properties]biomol genomic[Properties]50

51

Algorithm parameters: ProteinExpandMay limit resultsAdjust to set stringencyDefault statistics adjustmentfor compositional biasOff now by default. Conflicts withcomp-based stats52

Automatic ShortSequence Adjustmente-value 20000Word Size2Matrix PAM30Comp Stats OffLow Comp Filter Off53

54

55

A graphical view56

The BLAST hit list57

BLAST AlignmentsgapIdentical matchpositive score(conservative)58Negative or zero

BLAST Alignments59

SimilarityThe extent to which nucleotide or protein sequences arerelated. The extent of similarity between two sequences canbe based on percent sequence identity and/or conservation.In BLAST similarity refers to a positive matrix score. IdentityThe extent to which two (nucleotide or amino acid)sequences are invariant. HomologySimilarity attributed to descent from a common ancestor.It is your responsibility as an informed bioinformatician touse these terms correctly: A sequence is either homologousor not. Don’t use % with this term!60

BLAST statistics to record inyour bioinformatics labbookIt can be helpful to record thestatistics that are found atbottom of your BLAST results61

Sorting BLAST byTaxonomy62

63

Nucleotide BLAST64

Algorithm parameters:Nucleotide Prevents starting alignment in masked region Allows extensions through masked regionsMasks LC sequence (simple repeats) Masks species-specific interspersed repeats Essential for genomic query sequences65

nt BLAST: New OutputAB16863666

Sortable ResultsSeparateSections forTranscriptand GenomePseudogene onChromosome 967Functional Gene onChromosome 1

Total Score: All SegmentsFunctional GeneNow First68

Sorting in Exon OrderDefault Sorting Order: ScoreLongest exon usually first69Query startpositionExon order

70

Links to Map ViewerChromosome 1Chromosome 971

Recent and SavedStrategiesLogin to MyNCBI tosave searchstrategies72

Genomic and SpecializedBLAST pages73

Service Addresses General Helpinfo@ncbi.nlm.nih.gov BLASTblast-help@ncbi.nlm.nih.govTelephone support: 301- 496- 247574

75

BLASTPRACTICAL EXERCISE: The Jurassic Park Detective Story76

navigate to:bioteach.ubc.ca/bioinfo2008Let’s compareour resultsGet the sequences from thewebpage and carry out BLASTsearchesCan you identify the Dinosaur sequences?Search #2:The Lost WorldsequenceSearch #1:Jurassic Parksequenceuse blastn77use blastx

Try some BLAST searches withyour own sequence of interest Explore what happens when youchange advanced parameters 78

Search #1 - blastn against nr Most common useof blastn Sequence identification Establish whether anexact match for asequence is alreadypresent in the database79

80

Search #2 - blastx against nr Translating BLASTprograms (blastx,tblastn, tblastx) Look for similar proteins Identify potential homologsin other species81

Mark was here, NIH82

Credits Materials for this presentation have beenadapted with permission from the followingNCBI HelpDesk course materials:Field Guide Course MaterialsAdvanced Workshop for Bioinformatics InformationSpecialists NCBI BLASThttp://www.ncbi.nlm.nih.gov/blast/Blast.cgi83

Genome BrowsersAccessing Genome Annotations &PRACTICAL EXERCISE: Three DifferentViews of the BRCA1 Gene84

The Human Genome ProjectTheH!um Octoniagberana:–3Ged02e0t02enom 140,4 mplile s -cporrpA mies nieseoont:wnereaGnallymufiniHehsheTd!Celera GenomicsPublic HGPFebruary 2001: Completion of the Draft Human Genome85

86

Technology87

What is Bioinformatics?88

89

maps.google.ca90

Let’s Look at the Human Genome.91

Objectives By the end of this module: You will be able to describe the followingconcepts: genome annotation, genome builds, andgenome browsers. You will view the genomic location that containsthe BRCA1 gene in the human genome using threedifferent genome browsers. You will be able to compare and contrast theUCSC, Ensembl and MapViewer systems forvisualizing genome information.92

Genome Browsers What is a Genome Browser?-System for displaying, viewing, and accessinggenome annotation dataGenome annotations knowledge attached to rawgenome sequence.-Annotation information comes from manydifferent sources Computational pipelines Research groups Databases93

The “Neopolitan Ice Cream” Worldof Genome Browsing: UCSC Genome Browserhttp://genome.cse.ucsc.edu/ Ensemblhttp://www.ensembl.org/ NCBI Map Viewerhttp://www.ncbi.nlm.nih.gov/mapview/94

The underlying data iscommon for all three“flavors” of GenomeBrowsers.95

NCBI, UCSC and Ensembl use the samehuman genome assembly that is generated byNCBI-release timing is different between sites.Note the version of genome assembly towhich you are referring-available precomputed info and locations offeatures will be different between differentassemblies.96

Let’s compare the view ofthe BRCA1 gene in allthree genome browsers.97

Viewing the genomic regioncontaining BRCA1 Common features: Major Differences: Coordinate system is based Each Browser has a very Zoom in and out Annotations displayed – ie. Annotation informationon the builddifferent look and feeldisplayed differently Different ways to navigateGene featuresthrough the information98

http://genome.cse.ucsc.edu/Click onGenomeBrowserlink99

Search forBRCA1;Note samplequeries100

The Search Results Many BRCA1 isoforms All located on chr 17 same chr coordinates different gene structures101

102

Two tasks What genes are on either side of BRCA1 onchr 17? Can you figure out how to download thegenomic sequence for the BRCA1 region?103

Zoom inZoom outDNA linkDownloadSequence104

http://www.ensembl.org/Click on Human105

106

Click onENSG00000012048107

click here toview genomiclocationGeneView showsyou informationabout the gene108

Two tasks Using GeneView, can you figure out howmany different alternatively spliced isoformsexist for BRCA1? Using ContigView, can you figure out how todownload the genomic sequence for theBRCA1 region?109

GeneView showsyou informationabout thetranscriptsExportView givesyou access tosequence data110

http://www.ncbi.nlm.nih.gov/mapview/Two builds of human;Note many genomesavailable111

112

113Quick Filter Gene

114

115

Two tasks Can you figure out how to LinkOut to theOMIM and/or Homologene entries forBRCA1? Can you figure out how to download thegenomic sequence for the BRCA1 region?116

117LinkOutOMIM diseasesv sequence viewpr protein recorddl downloadhm Homologene

Credits UCSC Genome Browserhttp://genome.cse.ucsc.edu/ Ensembl Genome Browserhttp://www.ensembl.org/index.html NCBI .html118

BLOSUM vs PAM BLOSUM 62 is the default matrix in BLAST 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives may be more sensitive with a different matrix. BLOSUM 45 BLOSUM 62 BLOSUM 90 PAM 250 PAM 160 PAM 100

Related Documents:

AUTOMATIC IDENTIFICATION TECHNOLOGY (AIT) GUIDE for the . MILITARY SHIPPING LABEL (MSL) Generic Cargo MSL Unit Move Cargo MSL Personal Property MSL . IAW DTR 4500.9-R, DEFENSE TRANSPORTATION REGULATION, PART II . AND / OR . MIL-STD-129, MILITARY MARKING FOR SHIPMENT AND STORAGE . Version 1.4 April 2013 . Prepared by USTRANSCOM J4-T (618) 220-4633

Grade 8 MFL MYP Phase 1/2 MSL MYP Phase 2/3 (G9 - G10 Elective) Grade 9 MFL 3 MYP Phase 2/3 MSL MYP Phase 3/4 G10 Elective MA MYP4 Grade 10 MSL MYP Phase 4 MA MYP5 Grade 11 MSL DP1 IB Language B Mandarin Standard Level MA DP1 Grade 12 Grade 7 MFL MYP Phase 1 MHL MYP Phase 4 MA MYP1/2 MHL MYP Phase 4 MA MYP3 MHL MYP Phase 5 MHL MYP Phase 5 MHL DP1

All students, faculty, staff, and others, including event organizers, event staff and event participants, must assess themselves daily for COVID-19 symptoms prior to engaging in in-person UBC activities on UBC Premises. . To view more information on UBC’s approach to resume campus activitie

credential can add value to your resume -- no matter how long that resume is. A valuable addition to your resume Five Benefits for MSL Professionals and Managers 04 As career interest continues to grow in the MSL field I think it will be even more important to stand out in a strong fiel

Apr 29, 2019 · Conceptual Model for No. 2 Diesel Fuel Spill in Big Creek . Laboratory (MSL) of the USCG, we clipped feathers from this grebe and shipped them to MSL for oil fingerprint analysis. Marathon Pipe Line LLC separately sent MSL a sample of the diesel . Longnose Gar 1 4 1 .

Zoom is a technology you use outside of Canvas. You will need to create a Zoom account first, before setting up and holding your lectures online. Getting Started with Zoom CREATE A ZOOM ACCOUNT (UBC FACULTY AND STAFF) 1. Email av.helpdesk@ubc.ca with the following: a. Course name b. UBC email address

NAVIGATING YOUR FINAL DOCTORAL EXAMINATION . UBC Graduate and Postdoctoral Studies Session Objective To provide you with: . www.grad.ubc.ca . UBC Graduate and Postdoctoral Studies Examination Overview . The Final Doctoral Examination is composed of 2 parts: 1. External Examination – dissertatio

ARCHAEOLOGICAL ILLUSTRATION 13 HOME PAGE WHY DRAW? EQUIPMENT START HERE: TECHNIQUES HOW TO DRAW MORE ACTIVITIES LINKS Drawing pottery The general aim when drawing pottery is not only to produce an accurate, measured drawing but also to show the type of pot. Sh ape (or form) and decoration are therefore important. Many illustrators now include extra information to show how a pot was .