Bioinformatics

3y ago
37 Views
3 Downloads
1.79 MB
31 Pages
Last View : 14d ago
Last Download : 3m ago
Upload by : Isobel Thacker
Transcription

Biochemistry 324BioinformaticsIntroductionBioinformatics, Stellenbosch University

The is no prescribed handbook, but I will follow Pevsner closely Lecture notes will generally be available on SUNLearn the day before a lectureJonathan PevsnerBioinformatics and Functional Genomics 3rd EditionWiley-Blackwell2015ISBN: 978-1-118-58178-0 15 lectures 5 tutorials Class test: 25 May 14hBioinformatics, Stellenbosch University

At the end of this lecture you should be able to: define the terms bioinformatics explain the scope of bioinformatics describe web-based versus command-line approaches tobioinformatics. define the types of molecular databases define accession numbers and the significance of RefSeqidentifiers describe the main genome browsers and use them to studyfeatures of a genomic region use resources to study information about both individual genes(or proteins) and large sets of genes/proteins.Bioinformatics, Stellenbosch University

DefinitionsBioinformaticsResearch, development, or application of computational toolsand approaches for expanding the use of biological, medical,behavioural or health data, including those to acquire, store,organize, archive, analyse, or visualize such data.Computational BiologyThe development and application of data-analytical andtheoretical methods, mathematical modelling and computationalsimulation techniques to the study of biological, behavioural, andsocial y/Proteins/Bioinformatics, Computational Biology and ProteomicsBioinformatics, Stellenbosch University

Bioinformatics generally looks at macromoleculesPevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

Growth in DNA sequence deposition Doubles every 18 leotidesSequences218Feb stics/Bioinformatics, Stellenbosch University

How much information in DNA?bitvalue765432102726252423222120128 6432168421TCAGmCSay we have 8 different information statesβ-D-Glucopyranosyloxymethyluracil (base J)Bioinformatics, Stellenbosch University

How much information in DNA?Every bp 4 bitsHuman genome 3 billion bp 4 3 109 1.2 1010 bits 1.5 109 bytes 1.4 GB of informationThis amount of information is contained in a cell nucleus with 10µm diameterThere is 2m of DNA in every somatic human cellEach human in composed of about 1012 cellsThus every human contains 2 1012 m of DNA 2 109km of DNADistance from the sun to Uranus 2.8 109kmEach single human contains enough DNA to stretch from the sun to UranusBioinformatics, Stellenbosch University

Levels of application of bioinformaticsorganismcelltree of lifeBioinformatics, Stellenbosch University

Bioinformatics software: point-and-click or command linePevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

The Bioinformatics world is Linux Many bioinformatics tools and resources are availableon the command-line interface These are often on the Linux platform (or other Unixlike platforms such as the Mac command line). Theyare essential for many bioinformatics and genomicsapplications. Most bioinformatics software is written for the Linuxplatform (Python, Java, C, C ). Many bioinformatics datasets are so large (e.g. highthroughput technologies generate millions to billionsor even trillions of data points) requiring commandline tools to manipulate the data. You cannot open/manipulate most bioinformaticsdatasets in MS Excel!Bioinformatics, Stellenbosch University

International Nucleotide Sequence Database CollectionPevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

National Centre for Biotechnology Information (NCBI)Pevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

European Bioinformatics InstitutePevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

DNA Database of JapanPevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

Perhaps 40 petabases of DNA weregenerated in calendar year 2014 at majorsequencing centersPevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

Sequence data magnitudesBioinformatics, Stellenbosch University

Sequence file magnitudesSizeAbbreviation# bytesExampleBytes--1Single text characterKilobytes1 kb103Text file, 1000 charactersMegabytes1 MB106Text file, 1m charactersGigabytes1 GB109Size of GenBank: 600 GBTerabytes1 TB1012Size of 1000 Genomes Project: 500 TBPetabytes1 PB1015Size of SRA at NCBI: 5 PBExabytes1 EB1018Annual worldwide output: 2 EBBioinformatics, Stellenbosch University

Taxa represented in GenBank (at giBioinformatics, Stellenbosch University

Types of data in databasesPevsner J. Bioinformatics and Functional Genomics 3rd Edition Wiley-Blackwell 2015Bioinformatics, Stellenbosch University

Central bioinformatics resource: NCBINCBI (with Ensembl, EBI, UCSC) is one of the central bioinformatics sites. Itincludes: PubMedEntrez search engine integrating 40 databasesBLAST (Basic Local Alignment Search Tool)Online Mendelian Inheritance in ManTaxonomyBooksmany additional resourcesBioinformatics, Stellenbosch University

What is an accession number?An accession number is a label used to identify a sequence. It is a string of lettersand/or numbers that corresponds to a molecular sequence.Examples:CH471100.2 GenBank genomic DNA sequenceNC 000001.10Genomic contigrs121434231 dbSNP (single nucleotide polymorphism)AI687828.1 An expressed sequence tag (1 of 184)NM 001206696 RefSeq DNA sequence (from a transcript)NP 006138.1RefSeq proteinCAA18545.1 GenBank proteinO14896SwissProt protein1KT7Protein Data Bank structure recordBioinformatics, Stellenbosch University

Accessing NCBI via the , Stellenbosch University

NCBI Gene: example of query for beta globinBioinformatics, Stellenbosch University

NCBI Protein: hemoglobin subunit betaBioinformatics, Stellenbosch University

NCBI Protein: hemoglobin subunit betain the FASTA formatBioinformatics, Stellenbosch University

Accessing NCBI by Linux command-lineYou can download and install EDirect on your Linux 8/ use esearch to find hemoglobin proteins use pipe ( ) to efetch to retrieve the proteins in the FASTA format use head to display six lines of the outputBioinformatics, Stellenbosch University

Genome Browsers Versatile tools to visualize chromosomal positions (typically on x-axis)with annotation tracks (typically on y-axis).Useful to explore data related to some chromosomal feature of interestsuch as a gene.Prominent browsers are at Ensembl, UCSC, and NCBI.Many hundreds of specialized genome browsers are available, some forparticular organisms or molecule types.https://genome.ucsc.edu/cgi-bin/hgGatewayYou can also download and use a genome browser locally on yourcomputer:Integrative Genomics igv/Integrated Genome ics, Stellenbosch University

Bioinformatics, Stellenbosch University

Browser Extensible Data (BED) format1. chrom - The name of the chromosome (e.g. chr3, chrY, chr2 random) or scaffold (e.g.scaffold10671).2. chromStart - The starting position of the feature in the chromosome or scaffold. The firstbase in a chromosome is numbered 0.3. chromEnd - The ending position of the feature in the chromosome or scaffold.The 9 additional optional BED fields are:4. name - Defines the name of the BED line. This label is displayed to the left of the BED line inthe Genome Browser window when the track is open to full display mode or directly to theleft of the item in pack mode.5. score - A score between 0 and 1000.6. strand - Defines the strand. Either "." ( no strand) or " " or "-".7. thickStart - The starting position at which the feature is drawn thickly (for example, the startcodon in gene displays).8. thickEnd - The ending position at which the feature is drawn thickly (for example the stopcodon in gene displays).9. itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0).10. blockCount - The number of blocks (exons) in the BED line.11. blockSizes - A comma-separated list of the block sizes.12. blockStarts - A comma-separated list of block starts.Bioinformatics, Stellenbosch University

BED file output from UCSC Table Browser query forgenes on a region of human chromosome 11Bioinformatics, Stellenbosch University

Bioinformatics, Stellenbosch University Many bioinformatics tools and resources are available on the command-line interface These are often on the Linux platform (or other Unix-like platforms such as the Mac command line). They are essential for many bioinformatics and genomics applications.

Related Documents:

Bioinformatics Crash Course Ian Misner Ph.D. Bioinformatics Coordinator UMD Bioinformatics Core . Bioinformatics!Core The Plan Monday – Introductions – Linux and Python Hands-on Training Tuesday – NGS Introduction – RNAseq with Sailfish (Dr. Steve Mount, CBCB) – RNAse

SECTION-A: Attempt any five questions. SECTION-B: Attempt any five questions. SECTION–A Short Answer type Questions: (60-80 Words) 5 5 25 Marks 1. What is the role of internet in bioinformatics? 2. How bioinformatics assist in drug designing? 3. Write a short note on Internet Protocol (IP). 4. What is Pattern mining? 5.

volumes of biological information in bioinformatics database. They also provide some bioinformatics tools for database search and data acquire. With the explosion of sequence information available to researchers, the challenge facing bioinformatics and computational biologists is to aid in biomedical researches and to invent efficient toolkits.

tronics, Physics, Statistics, or Business Informatics. 8 LUM RAMABAJA Bachelor’s Student in Bioinformatics ‘Bioinformatics is a truly interesting field. The program has inspired me to apply what I have learned and help people by starting a company that diagnoses malaria.’ To The Point KRISTINA PREUER BSc MSc Graduate in Bioinformatics

Bioinformatics is an interdisciplinary area of the science composed of biology, mathematics and computer science. Bioinformatics is the application of information technology to manage biological data that helps in decoding plant genomes. The field of bioinformatics emerged as a tool to facilitate biological discoveries more than 10 years ago.

Bioinformatics Bioinformatics is the combination of biology and information technology. The discipline encompasses any computational tools and methods used to manage, analyze and manipulate large sets of biological data. Essentially, bioinformatics has three components: The creation of databases allowing the storage and

Structural bioinformatics adds scale and precision Structural Bioinformatics Structure Prediction Integrative Methods Molecular Simulation Structure Alignment Functional Site Comparison Docking . Lehigh University BioS 10: BioSciences in the 21st Century Brian Y. Chen Many computational fields support Structural Bioinformatics Structural

Second’Grade’ ’ Strand:(ReadingInformational(Text’ Topics( Standard( “Ican ”statements( Vocabulary(Key(Ideas(and(Details ’ RI.2.1.’Ask’andanswer .