joanne@msl.ubc.caBioinformaticsCommon tools, useful databases, and tricks of the trade.1bioteach.ubc.ca/bioinfo2008
Workshop Schedule Laptops, available here foryour use 9am - 4:30pm wireless loginmslguest4myguest Vancouver guide booksavailable2
Today’s Topics BLAST - Finding Function by Sequence Similarity GUIDED TOUR - Advanced Tips & Tricks for UsingBLAST PRACTICAL EXERCISES - The Jurassic ParkDetective Story Genome Browsers - Accessing Genome PRACTICAL EXERCISES - Three different viewsof the BRCA1 geneAnnotations3
BLASTFinding Function By Sequence Similarity4
Concepts of SequenceSimilarity Searching The premise:One sequence by itself is not informative; itmust be analyzed by comparative methodsagainst existing sequence databases todevelop hypothesis concerning relatives andfunction.5
The BLAST algorithm The BLAST programs (Basic Local AlignmentSearch Tools) are a set of sequencecomparison algorithms introduced in 1990that are used to search sequence databasesfor optimal local alignments to a query. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) “Basic localalignment search tool.” J. Mol. Biol. 215:403-410. Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ(1997) “Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs.” NAR 25:3389-3402.6
Submit QueryRequest ResultsBLASTWeb PageBLASTserverReturn Formatted ResultsDisplay Resultsfetch ASN.1ASN.17fetch sequenceBLASTdb
What BLAST tells you . BLAST reports surprising alignments-Different than chanceAssumptions-Random sequencesConstant compositionConclusions-Surprising similarities imply evolutionary homologyEvolutionary Homology: descent from a common ancestorDoes not always imply similar function8
Basic Local AlignmentSearch Tool Widely used similarity search tool Finds best local alignmentsHeuristic approach based on Smith WatermanalgorithmProvides statistical significancewww, standalone, and network clients9
BLAST programsblastpCompares an amino acid query sequence against a proteinsequence database.blastnCompares a nucleotide query sequence against a nucleotidesequence database.blastxCompares a nucleotide query sequence translated in all readingframes against a protein sequence database.You could use thisoption to find potential translation products of an unknownnucleotide sequence.tblastnCompares a protein query sequence against a nucleotidesequence database dynamically translated in all reading frames.tblastxCompares the six-frame translations of a nucleotide querysequence against the six-frame translations of a nucleotidesequence database.10
more BLAST programsMegablastPositionSpecificContiguousNearly identical sequencesDiscontiguousCross-species comparisonPSI-BLASTAutomatically generates a positionspecific score matrix (PSSM)RPS-BLASTSearches a database of PSI-BLASTPSSMsnucleotide onlyprotein only11
BLAST Algorithm Scoring of matches done using scoringmatrices Sequences are split into words (default n 3) Speed, computational efficiency BLAST algorithm extends the initial “seed” hitinto an HSP HSP high scoring segment pair Local optimalalignment12
Sequence Similarity Searching –The statistics are importantDiscriminating between real and artifactual matches isdone using an estimate of probability that the match mightoccur by chance.We’ll talk more about the meaning of the scores (S) and evalues (E) that are associated with BLAST hits13
Where does the score(S) come from? The quality of each pair-wise alignment isrepresented as a score and the scores areranked. Scoring matrices are used to calculate thescore of the alignment base by base (DNA)or amino acid by amino acid (protein). The alignment score will be the sum of thescores for each position.14
What’s a scoring matrix? Substitution matrices are usedfor amino acid alignments. each possible residuesubstitution is given a scoreA simpler unitary matrix isused for DNA pairs ( 1 formatch, -2 mismatch)615
16
BLOSUM vs PAMBLOSUM 45PAM 250BLOSUM 62PAM 160More Divergent BLOSUM 90PAM 100Less DivergentBLOSUM 62 is the default matrix in BLAST2.0. Though it is tailored for comparisons ofmoderately distant proteins, it performs wellin detecting closer relationships. A search fordistant relatives may be more sensitive with adifferent matrix.17
What do the Score andthe e-value really mean? The quality of the alignment is represented bythe Score (S).The score of an alignment is calculated as the sum of substitution and gap scores.Substitution scores are given by a look-up table (PAM, BLOSUM) whereas gapscores are assigned empirically . The significance of each alignment iscomputed as an E value (E).Expectation value. The number of different alignments with scores equivalent to orbetter than S that are expected to occur in a database search by chance. The lowerthe E value, the more significant the score.18
Notes on E-values Low E-values suggest that sequences arehomologous Can’t show non-homology Statistical significance depends on both thesize of the alignments and the size of thesequence database Important consideration for comparing results acrossdifferent searches E-value increases as database gets bigger E-value decreases as alignments get longer19
Homology: SomeGuidelines Similarity can be indicative of homology Low complexity regions can be highly similarwithout being homologous Homologous sequences not always highlysimilarGenerally, if two sequences are significantlysimilar over entire length they are likelyhomologous20
Suggested lays Source: Chapter 11 – Bioinformatics: APractical Guide to the Analysis of Genes andProteins For nucleotide based searches, one shouldlook for hits with E-values of 10-6 or less andsequence identity of 70% or more For protein based searches, one should lookfor hits with E-values of 10-3 or less andsequence identity of 25% or more21
BLAST Algorithm Scoring of matches done using scoringmatrices Sequences are split into words (default n 3) -Speed, computational efficiencyBLAST algorithm extends the initial “seed” hitinto an HSP-HSP high scoring segment pair Local optimalalignment22
How Does BLAST ReallyWork? The BLAST programs improved the overallspeed of searches while retaining goodsensitivity (important as databases continueto grow) by breaking the query and databasesequences into fragments ("words"), andinitially seeking matches between fragments. Word hits are then extended in eitherdirection in an attempt to generate analignment with a score exceeding thethreshold of "S".23
BLAST Algorithm24
How Does BLAST ReallyWork? The BLAST programs improved the overallspeed of searches while retaining goodsensitivity (important as databases continueto grow) by breaking the query and databasesequences into fragments ("words"), andinitially seeking matches between fragments. Word hits are then extended in eitherdirection in an attempt to generate analignment with a score exceeding thethreshold of "S".25
BLAST Algorithm26
Extending the High ScoringSegment Pair (HSP)MinimumScore (S)NeighborhoodScore Threshold (T)27
28
BLAST Algorithm Scoring of matches done using scoringmatrices Sequences are split into words (default n 3) -Speed, computational efficiencyBLAST algorithm extends the initial “seed” hitinto an HSP-HSP high scoring segment pair Local optimalalignment29
Credits Materials for this presentation have beenadapted from the following sources:NCBI HelpDesk - Field Guide Course MaterialsBioinformatics: A practical guide to the analysis of genesand proteins Questions? Please contact:Dr. Joanne FoxMichael Smith Laboratoriesjoanne@msl.ubc.ca30
31
BLASTGUIDED TOUR: Advanced Tips & Tricks for Using BLAST32
http://www.ncbi.nlm.nih.gov/BLAST/33
New BLAST homepage34
Submit QueryRequest ResultsBLASTWeb PageBLASTserverReturn Formatted ResultsDisplay Resultsfetch ASN.1ASN.135fetch sequenceBLASTdb
Consider your researchquestion . Are you looking for an particular gene in aparticular species? Are you looking for additional members of aprotein family across all species? Are you looking to annotate genes in yourspecies of interest?36
Know your reagents Changing your choice of database is changingyour search space Database size affects the BLAST statistics Databases change rapidly and are updatedfrequently37
Protein Databases: nr nr (non-redundant protein sequences)-GenBank CDS translationsNP RefSeqsOutside ProteinPIR, Swiss-Prot, PRFPDB (sequences from structures)-pat protein patentsenv nr environmental samples38Servicesblastpblastx
Nucleotide Databases:Human and Mouse Human and mouse genomic transcript defaultSeparate sections in output for mRNA and genomicDirect links to Map Viewer for genomic sequencesMegablast, blastn service39
Nucleotide 0
Nucleotide Databases:Traditional nr (nt)-Traditional GenBank NM and XM RefSeqs refseq rna refseq genomic-NC RefSeqs dbest-EST Divisionhtgs-HTG divisiongss-GSS divisionwgs-whole genome shotgunenv nt-environmental samples est human, mouse, othersDatabases are mostly non-overlapping41
http://www.ncbi.nlm.nih.gov/BLAST/Program SelectionGuide42
43
44
45
46
23157147
Context Specific Help48
Limiting Database:OrganismOrganism autocomplete49
Limiting Database: EntrezQueryall[filter] NOT mammals[organism]gene in mitochondrion[Properties]2006:2007 [Modification Date]Nucleotidebiomol mrna[Properties]biomol genomic[Properties]50
51
Algorithm parameters: ProteinExpandMay limit resultsAdjust to set stringencyDefault statistics adjustmentfor compositional biasOff now by default. Conflicts withcomp-based stats52
Automatic ShortSequence Adjustmente-value 20000Word Size2Matrix PAM30Comp Stats OffLow Comp Filter Off53
54
55
A graphical view56
The BLAST hit list57
BLAST AlignmentsgapIdentical matchpositive score(conservative)58Negative or zero
BLAST Alignments59
SimilarityThe extent to which nucleotide or protein sequences arerelated. The extent of similarity between two sequences canbe based on percent sequence identity and/or conservation.In BLAST similarity refers to a positive matrix score. IdentityThe extent to which two (nucleotide or amino acid)sequences are invariant. HomologySimilarity attributed to descent from a common ancestor.It is your responsibility as an informed bioinformatician touse these terms correctly: A sequence is either homologousor not. Don’t use % with this term!60
BLAST statistics to record inyour bioinformatics labbookIt can be helpful to record thestatistics that are found atbottom of your BLAST results61
Sorting BLAST byTaxonomy62
63
Nucleotide BLAST64
Algorithm parameters:Nucleotide Prevents starting alignment in masked region Allows extensions through masked regionsMasks LC sequence (simple repeats) Masks species-specific interspersed repeats Essential for genomic query sequences65
nt BLAST: New OutputAB16863666
Sortable ResultsSeparateSections forTranscriptand GenomePseudogene onChromosome 967Functional Gene onChromosome 1
Total Score: All SegmentsFunctional GeneNow First68
Sorting in Exon OrderDefault Sorting Order: ScoreLongest exon usually first69Query startpositionExon order
70
Links to Map ViewerChromosome 1Chromosome 971
Recent and SavedStrategiesLogin to MyNCBI tosave searchstrategies72
Genomic and SpecializedBLAST pages73
Service Addresses General Helpinfo@ncbi.nlm.nih.gov BLASTblast-help@ncbi.nlm.nih.govTelephone support: 301- 496- 247574
75
BLASTPRACTICAL EXERCISE: The Jurassic Park Detective Story76
navigate to:bioteach.ubc.ca/bioinfo2008Let’s compareour resultsGet the sequences from thewebpage and carry out BLASTsearchesCan you identify the Dinosaur sequences?Search #2:The Lost WorldsequenceSearch #1:Jurassic Parksequenceuse blastn77use blastx
Try some BLAST searches withyour own sequence of interest Explore what happens when youchange advanced parameters 78
Search #1 - blastn against nr Most common useof blastn Sequence identification Establish whether anexact match for asequence is alreadypresent in the database79
80
Search #2 - blastx against nr Translating BLASTprograms (blastx,tblastn, tblastx) Look for similar proteins Identify potential homologsin other species81
Mark was here, NIH82
Credits Materials for this presentation have beenadapted with permission from the followingNCBI HelpDesk course materials:Field Guide Course MaterialsAdvanced Workshop for Bioinformatics InformationSpecialists NCBI BLASThttp://www.ncbi.nlm.nih.gov/blast/Blast.cgi83
Genome BrowsersAccessing Genome Annotations &PRACTICAL EXERCISE: Three DifferentViews of the BRCA1 Gene84
The Human Genome ProjectTheH!um Octoniagberana:–3Ged02e0t02enom 140,4 mplile s -cporrpA mies nieseoont:wnereaGnallymufiniHehsheTd!Celera GenomicsPublic HGPFebruary 2001: Completion of the Draft Human Genome85
86
Technology87
What is Bioinformatics?88
89
maps.google.ca90
Let’s Look at the Human Genome.91
Objectives By the end of this module: You will be able to describe the followingconcepts: genome annotation, genome builds, andgenome browsers. You will view the genomic location that containsthe BRCA1 gene in the human genome using threedifferent genome browsers. You will be able to compare and contrast theUCSC, Ensembl and MapViewer systems forvisualizing genome information.92
Genome Browsers What is a Genome Browser?-System for displaying, viewing, and accessinggenome annotation dataGenome annotations knowledge attached to rawgenome sequence.-Annotation information comes from manydifferent sources Computational pipelines Research groups Databases93
The “Neopolitan Ice Cream” Worldof Genome Browsing: UCSC Genome Browserhttp://genome.cse.ucsc.edu/ Ensemblhttp://www.ensembl.org/ NCBI Map Viewerhttp://www.ncbi.nlm.nih.gov/mapview/94
The underlying data iscommon for all three“flavors” of GenomeBrowsers.95
NCBI, UCSC and Ensembl use the samehuman genome assembly that is generated byNCBI-release timing is different between sites.Note the version of genome assembly towhich you are referring-available precomputed info and locations offeatures will be different between differentassemblies.96
Let’s compare the view ofthe BRCA1 gene in allthree genome browsers.97
Viewing the genomic regioncontaining BRCA1 Common features: Major Differences: Coordinate system is based Each Browser has a very Zoom in and out Annotations displayed – ie. Annotation informationon the builddifferent look and feeldisplayed differently Different ways to navigateGene featuresthrough the information98
http://genome.cse.ucsc.edu/Click onGenomeBrowserlink99
Search forBRCA1;Note samplequeries100
The Search Results Many BRCA1 isoforms All located on chr 17 same chr coordinates different gene structures101
102
Two tasks What genes are on either side of BRCA1 onchr 17? Can you figure out how to download thegenomic sequence for the BRCA1 region?103
Zoom inZoom outDNA linkDownloadSequence104
http://www.ensembl.org/Click on Human105
106
Click onENSG00000012048107
click here toview genomiclocationGeneView showsyou informationabout the gene108
Two tasks Using GeneView, can you figure out howmany different alternatively spliced isoformsexist for BRCA1? Using ContigView, can you figure out how todownload the genomic sequence for theBRCA1 region?109
GeneView showsyou informationabout thetranscriptsExportView givesyou access tosequence data110
http://www.ncbi.nlm.nih.gov/mapview/Two builds of human;Note many genomesavailable111
112
113Quick Filter Gene
114
115
Two tasks Can you figure out how to LinkOut to theOMIM and/or Homologene entries forBRCA1? Can you figure out how to download thegenomic sequence for the BRCA1 region?116
117LinkOutOMIM diseasesv sequence viewpr protein recorddl downloadhm Homologene
Credits UCSC Genome Browserhttp://genome.cse.ucsc.edu/ Ensembl Genome Browserhttp://www.ensembl.org/index.html NCBI .html118
BLOSUM vs PAM BLOSUM 62 is the default matrix in BLAST 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives may be more sensitive with a different matrix. BLOSUM 45 BLOSUM 62 BLOSUM 90 PAM 250 PAM 160 PAM 100
AUTOMATIC IDENTIFICATION TECHNOLOGY (AIT) GUIDE for the . MILITARY SHIPPING LABEL (MSL) Generic Cargo MSL Unit Move Cargo MSL Personal Property MSL . IAW DTR 4500.9-R, DEFENSE TRANSPORTATION REGULATION, PART II . AND / OR . MIL-STD-129, MILITARY MARKING FOR SHIPMENT AND STORAGE . Version 1.4 April 2013 . Prepared by USTRANSCOM J4-T (618) 220-4633
Grade 8 MFL MYP Phase 1/2 MSL MYP Phase 2/3 (G9 - G10 Elective) Grade 9 MFL 3 MYP Phase 2/3 MSL MYP Phase 3/4 G10 Elective MA MYP4 Grade 10 MSL MYP Phase 4 MA MYP5 Grade 11 MSL DP1 IB Language B Mandarin Standard Level MA DP1 Grade 12 Grade 7 MFL MYP Phase 1 MHL MYP Phase 4 MA MYP1/2 MHL MYP Phase 4 MA MYP3 MHL MYP Phase 5 MHL MYP Phase 5 MHL DP1
All students, faculty, staff, and others, including event organizers, event staff and event participants, must assess themselves daily for COVID-19 symptoms prior to engaging in in-person UBC activities on UBC Premises. . To view more information on UBC’s approach to resume campus activitie
credential can add value to your resume -- no matter how long that resume is. A valuable addition to your resume Five Benefits for MSL Professionals and Managers 04 As career interest continues to grow in the MSL field I think it will be even more important to stand out in a strong fiel
Apr 29, 2019 · Conceptual Model for No. 2 Diesel Fuel Spill in Big Creek . Laboratory (MSL) of the USCG, we clipped feathers from this grebe and shipped them to MSL for oil fingerprint analysis. Marathon Pipe Line LLC separately sent MSL a sample of the diesel . Longnose Gar 1 4 1 .
Zoom is a technology you use outside of Canvas. You will need to create a Zoom account first, before setting up and holding your lectures online. Getting Started with Zoom CREATE A ZOOM ACCOUNT (UBC FACULTY AND STAFF) 1. Email av.helpdesk@ubc.ca with the following: a. Course name b. UBC email address
NAVIGATING YOUR FINAL DOCTORAL EXAMINATION . UBC Graduate and Postdoctoral Studies Session Objective To provide you with: . www.grad.ubc.ca . UBC Graduate and Postdoctoral Studies Examination Overview . The Final Doctoral Examination is composed of 2 parts: 1. External Examination – dissertatio
ARCHAEOLOGICAL ILLUSTRATION 13 HOME PAGE WHY DRAW? EQUIPMENT START HERE: TECHNIQUES HOW TO DRAW MORE ACTIVITIES LINKS Drawing pottery The general aim when drawing pottery is not only to produce an accurate, measured drawing but also to show the type of pot. Sh ape (or form) and decoration are therefore important. Many illustrators now include extra information to show how a pot was .