LECTURE 7 Blast - Trinity College Dublin

3y ago
19 Views
3 Downloads
2.29 MB
31 Pages
Last View : 10d ago
Last Download : 3m ago
Upload by : Ronnie Bonney
Transcription

LECTURE 7Blast

Using BLAST to search sequence databases Aims– Learn how to use BLAST (blast.ncbi.nlm.nih.gov)BLASTP, BLASTN, TBLASTN, BLASTX– Learn what's in the NCBI sequence databases Refseq Accession numbers Genome, WGS, single-gene, EST– Concept of annotation

word sizek 4

What BLAST does(BLAST was developed by Stephen Altschul et al, 1990. It is the most-cited scientific paper ever.)BLAST looks for HSPs:HSP: "High-Scoring Pair" a grey region in the previous slide, i.e. a region of matchingbetween your Query and a database entry (the Subject). HSPs usually don't have gaps inthe alignment between Query and Subject, or have only small gaps.A Query can have several HSPs to the same Subject.For each Subject in the database (millions of them), BLAST asks:Does the Subject match the Query with at least k identical letters?(by default, "word size” k 8 for DNA; k 3 for protein)If yes, BLAST then extends each k-matching region out as far as it can, to make an HSP.The HSP is given a score, which is:for DNA, the score is just 2x the number of matching letters, minus gap penalties.for proteins, the score is calculated from a BLOSUM62 matrix.

What BLAST doesWhen a search is run, BLAST keeps a list of the database Subjects whose HSPs had the highestscores to your Query. (Typically 1000 are kept).The score of each HSP in the list is then converted into an E-value ("expect" value).An E-value is the number of HSPs expected to have this score or higher, purely by chance,taking into account:– the size of the database– the composition of the Query (e.g. a query that is AAAAAAAAAAA will have a lot of spurious hits).Low E-values mean strong hits.In theory, any HSP with E 1 is significant.In practice, a hit is only “convincing” if E is 1 x 10-6 or lower. This is written as 1.0e-6.The output from BLAST is a sorted list of the Subjects with the lowest E-values in the database.Note that-- An E-value is not a probability.-- In any search, something has to be the best hit. The trick is figuring out if the hit is acoincidence or due to shared ancestry (homology) of the sequences.

Exercise Find the sequences of EPO genes in as many differentspecies as we can. By sequence similarity searching. Starting with human EPO:– Nucleotide database accession number X02157– Protein database accession number CAA26094

blast.ncbi.nlm.nih.gov

4 types of BLAST search: #1, BLASTN ( tBLASTXProteinTBLASTNBLASTPBLASTN: Searches a DNA Query vs. a DNA database.Typical use: to find highly-similar DNA sequences.Advantages: It's the only option for sequences that are not protein-coding.Disadvantages:- It will miss genes whose sequences have diverged a lot.- Repetitive DNA sequences cause problems (e.g. human Alu repeats).

Nucleotide databases for BLAST(BLASTN, TBLASTN) Human Genomic TranscriptMouse Genomic TranscriptNucleotide collection (nr/nt) (“nonredundant nucleotide” db)Reference RNA sequences (refseq RNA)Reference genomic sequences (refseq genomic)Expressed sequence tags (EST)Whole genome shotgun contigs (WGS)and others

NCBI sequence databasesprotein database(redundant)Nonredundant(NR) proteinproteinRefSeqproteinnucleotideNR nucleotideSingle-gene seqs,BAC clones, fosmids,Non-model species.RefSeqmRNAFor genome-projectspeciesRefSeqgenomicESTs(expressed sequence tags)WGS data(unannotated)

Example: 1A: BLASTN: Query is human EPO cDNA. Database is Human Genomic Transcript.Score of bestindividual HSPHyperlinksdown pageto eachalignmentTotal scoreof all HSPsE-value of bestindividual HSP.Sorted: lowest first,for each database.

Example: 1A: One of the genomic hits from this search, marked by green arrow on previous slideQuery:Human EPO cDNA sequence (GenBank X02157)ATG1TGA1942nd HSP1944th HSP3403364264295th HSP60560713303rd HSP38,351,26638,353,44138,351,45938,354,166ATGExon 11st HSPTGAExon 2Exon 3Exon 4Subject:Human genomic sequence(human chromosome 7 from Refseq: NT 007933.15)(version 37 of the reference human genome seq.)Exon 5

Example: 1B: BLASTN: Query is human EPO cDNA. Database is Refseq RNA ( more species).

Example: 1C: BLASTN: Query is human EPO cDNA. Database is NR ( lots of species).Human, top hit, E 0Eospalax, 50th hit, E 2.1e-152100th

100th107th109th

Protein databases for BLAST(BLASTP, BLASTX) Nonredundant protein sequences (nr)Reference proteins (refseq protein)UniProtKB (Swiss-prot)Protein Databank proteins (pdb) with known 3D structuresand others

4 types of BLAST search: #2, einBLASTXBLASTPDatabaseBLASTP: protein query vs. protein database.Typical use: to find hits in annotated protein databases.Advantages : Much more sensitive than BLASTN.Disadvantages : It will miss unannotated genes (they're not in protein database).

Example: 2: BLASTP: Query is human EPO protein. Database is NR proteins.E-values.Sorted: lowest first.

4 types of BLAST search: #3, einBLASTXBLASTPDatabaseBLASTX: DNA query vs. protein database.Typical use: What does this piece of DNA code for? e.g. an EST.Advantages : Like BLASTP, but the Query doesn't need to be annotated.Disadvantages : It will miss unannotated genes (they're not in protein database).

6 reading frames:6 ways that the same DNA sequence could potentially encode aprotein. S H L V E A L Y L V C G E R G F F. frame.L T P G G S S L P S V R G T R L L . frame. H T W W K L S T * C A G N E A S . frame1 c 1 2 35151 a 1. E E A S F P A H * V E S F H Q V * . frame -1. K K P R S P H T R * R A S T R C . frame -2. R S L V P R T L G R E L P P G V . frame -3

Bothrops alternatus (common pit viper)What does the EST with accession number GW576306 code for?Or GW576313 ?Or GW576315 ?An EST (expressed sequence tag) is a single sequencing read from arandom clone in a cDNA library a randomly sampled mRNA.

Example: 3: BLASTX: Query is snake EST EPO GW576306. Database is NR proteins.

4 types of BLAST search: #4, teinBLASTXBLASTPDatabaseTBLASTN: Searches a protein query vs. DNA database.Typical use: Can I find any new homologs of my gene?Advantages : Like BLASTP, but the database entry doesn't need to be annotated.Disadvantages : Your query needs to be a protein.

4 types of BLAST search: #5, TXProteinTBLASTNBLASTPTBLASTX: DNA query vs. DNA database, 6-frame translations.(Comparing all proteins that could possibly be encoded by the Query,to all proteins that could possibly be encoded by each sequence in the database.)Typical use: I'm desperate!Advantages: Query and database can both be unannotated.Disadvantages: Dreadfully slow. TBLASTX searches against most databases are banned on theNCBI server. Results can be hard to interpret.

J. CraigVenterCeleraGenomics2001FrancisCollinsIntl. HumanGenomeSequencingConsortium

1991

1996

What BLAST does (BLAST was developed by Stephen Altschul et al, 1990.It is the most-cited scientific paper ever.) BLAST looks for HSPs: HSP: "High-Scoring Pair" a grey region in the previous slide, i.e. a region of matching between your Query and a database entry (the Subject).HSPs usually don't have gaps in

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

blast furnaces whose Inner Volume exceeds 5,000m3. Figure 2: Comparison of fuel rate and productivity between Blast furnaces of NSE's and others 1.2 Features of large blast furnace and proposed technology In general, stable operation is difficult for large blast furnace, productivity and gas utilization falls as inner volume becomes larger.

Blast injuries occur through multiple mechanisms.12 13 15 1njuries directly related to the initial blast wave are referred to as primary blast injuries. In addition to primary injuries, the blast wind that follows the overpressure wave can propel objects including shrapnel contained within the lED, causing secondary injury.

ducing agents for blast furnace is approved under the provisions of this law. 2. Use as a reducing agent in blast furnaces 2.1 Blast furnaces As shown in Fig. 1, iron ore and coke are loaded into the blast furnace from the top in alternate layers, and hot air from the tuyeres at the base of the furnace fed in to generate CO gas from the coke .

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

The API Aboveground Storage Tank Inspector Certification Examination is designed to identify individuals who have satisfied the minimum qualifications specified in API Standard 653, Tank Inspection, Repair, Alteration, and Reconstruction. Questions may be taken from anywhere within each document in this Body of Knowledge (BOK), unless specifically excluded herein. In the event that specific .