Introduction To The UCSC Genome Browser - UNSW Sites

1y ago
33 Views
2 Downloads
4.24 MB
63 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Elisha Lemon
Transcription

Introduction to the UCSC genome browserDominik BeckNHMRC Peter Doherty and CINSW ECR Fellow, Senior LecturerLowy Cancer Research Centre, UNSW and Centre for Health Technology, UTSSYDNEY NSW AUSTRALIA

What we will coverStructure of thehuman genomeGenomic informationData acquisitionUCSCGenome Browser

Structure of human genomeACC TGGAnnunziato A. 2008. DNA packaging: Nucleosomes and chromatin. Nature Education 1(1).

Structure of human genome Total of 23 pairs of chromosomes. Each chromosome is diploid. Each individual chromosomemade up of double stranded DNA. 3 billion bps (2m) compacted ina cell (15 μm)Annunziato A. 2008. DNA packaging: Nucleosomes and chromatin. Nature Education 1(1).

Information in the genomeGenes: 1.2% coding 2% non-codingRegulatory regions: 2%Repetitive elements comprise another 50% of the human genome

Information in the genomeEncyclopedia of DNA Elements: ENCODE 147 cell types / 1,640 data sets 80.4% of the human genome participates inat least one biochemical event 95% within 8 kb of a biochemical events 99% within 1.7 kb of a biochemical eventsNature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.Nat Methods. 2015 Apr;12(4):339-42. doi: 10.1038/nmeth.3321.Clark et all 2015 Capture sequencing / 24 cell types 22046 novel exons 10136 novel splice junctions

Reference human genome Human genomes vary significantly between individuals ( 0.1%) Important things to note about the reference genome:– Is a composite sequence (i.e. does not correspond to anyone’s genome)– Is haploid (i.e. only 1 sequence) Computationally, a reference genome is used.

Reference human genome Genomic data is most common represented in two ways:1. Sequence data – fasta format (.fa or .fasta) tttt.2. Location data – bed format 9919endHES4ISG15name00score strandAll about genomic formats here - http://genome.ucsc.edu/FAQ/FAQformat.html

What we will coverStructure of thehuman genome DNA (Sequence variation) RNA (Genes & gene expression)Genomic information Regulation\Epigenetics DNA methylation Histone modification Transcription factor binding

DNA: Sequence variation

Variations in DNA sequence Cytological level:– Entire chromosome (e.g. chromosome numbers)– Partial chromosome (e.g. segmentalduplications, rearrangements, and deletions) Sub-chromosomal level:– Transposable elements– Short Deletions/Insertions, Tandem repeats Sequence level:– Single Nucleotide Polymorphisms (SNPs)– Small Nucleotide Insertions and Deletions(Indels; 100bps)

Sequence variation Single nucleotidepolymorphisms (SNPs)– DNA sequence variations thatexist with members of a species.– They are inherited at birth andtherefore present in all cells. Somatic mutations– Are somatic – i.e. only presentin some cells.– Mutations are often observed incancer cells.

Types of SNPs/MutationsSynonymousTSSNon-codingIntergenic region Most SNPs and mutations fall inintergenic regions. Within genes, they can either fall in thenon-coding or coding regions. Within coding regions, they can eithernot-change (synonymous) or change(non-synonymous) amino acids.TSS CodingNon-Synonymous

Effects of sequence variation Non-synonymous variants:– Missense (change protein structure)– Nonsense (truncates protein) Synonymous or non-coding variants:––––Alter transcriptional/translational efficiencyAlter mRNA stabilityAlter gene regulation (i.e. alter TF binding)Alter RNA-regulation (i.e. affect miRNA binding)Majority of sequence variation are neutral ( 1% phenotype)

RNA: Genes and gene expression

Types of genes A gene is a functional unit of DNA that is transcribedinto RNA. Total genes in the human genome – 57,445mRNAmiRNAlncRNASource: GENCODE (version 18)

Protein coding genes 20,000 in the humangenome. Due to splicing one genecan make many proteins. Traditionally considered tobe the most importantfunctional unit ofgenomes.Source: http://www.news-medical.net

MicroRNA (miRNA)miRNA gene Discovered in 1993.pri-miRNAmiRNA/miRNA*miRISK w selectedmiRNA armNucleus Cytoplasmpre-miRNA Plays a role in posttranscriptional regulation. Acts by either causing RNAdegradation or inhibitionof translation. Implicated in many aspectsof health and diseaseincluding:– Development– Cancer– Heart disease

Long non-coding RNA (lncRNA) Recently described class of RNAswhich often transcribed by PolIIpromoters and often spliced. Unlike coding and miRNAs, lncRNAare less conserve. Non-coding transcripts 200 nt inlength. Many functions. Commonlyrecruitment of histone modifiers

RNA expression Measuring the level of RNA inthe sample. Generally microarray-,sequencing- or high-throughputPCR- based. Computation analysis andnormalisation of expression datacan be complicated.

RNA expression applicationsMegakaryocyte Relatively cheap and fast readoutof the functional state of a cellHSCMEPTCellsBCells Association with clinical features- sequence variations- response to therapy- patient survival Differential expression- between samples, or- between genes

RNA expression applications Differential expression ofindividual genes notnecessarily informative. Genes are often grouped ingene-sets based on ontologyor biological pathways.

Gene RegulationEpigenetics

Epigenetics Mechanisms that alter cellular functionindependent to any changes in DNA sequence Mechanisms include:– Transcriptional regulation: Transcription Factors– Genome methylation– Histone modification / Nucleosome positioning– Non-coding RNA

Transcriptional regulation Transcription factors are proteins that bind DNA toco-regulate gene expression. Typically binds at gene promoters or enhancers.

DNA methylation DNA is methylated on cytosine's in CpGdinucleotides

Nucleosomes & Histones n

What we will coverStructure of thehuman genome DNA (Sequence variation) RNA (Genes & gene expression)Genomic information Epigenetics DNA methylation Histone modification Transcription factor bindingData acquisition Microarrays Sequencing Chromatin IP

Array TechnologyLabeling Relies on fluorescence-basedon hybridisation of DNA againstcomplementary probe onarray.Processing Known molecule that can beconverted to cDNA.– Expression array (probe forexonic DNA regions)– SNP array (probe for twoalleles)– Methylation array (probe forbisulfide converted DNA) Limited by probes present onthe array.https://www.dkfz.de/gpcf/affymetrix genechips.html

Array TechnologyImages ProcessingPre-processingQuantificationBackgrd. Subs., Norm.Statistics & Data AnalyticsPost-processinge.g. DiffExp, Clinical AssocBatch and Outlier removalSystems Biologye.g. Pathway analysishttps://www.dkfz.de/gpcf/affymetrix genechips.html

Next-generation sequencing

Next-generation sequencing g

RNA-seq (vs mRNA Array)Alignmenthuman reference genomeQuantificationmRNA/miRNA/lncRNAStatistics / Bioinformatics

Chromatin ImmunoprecipitationSequencing (ChIP-seq)ChIP-seq of the seven transcription factorsFLI1, ERG, GATA2, RUNX1, SCL, LYL1 and LMOHigh- throughputsequencingBioinformaticsERG locus

Pros/cons of each technology NGS– Greater dynamic range (only limited by depth ofsequencing)– Coverage of genome does not need to be limited.– Many more applications from sequencing data.– Data analysis and management can be challenging. Microarrays– Microarrays are still significantly cheaper.– Largest public datasets are likely to be microarray based.– Data analysis pipelines are well standardised.

What we will coverStructure of thehuman genomeGenomic informationData acquisitionUCSCGenome Browser BackgroundGenome AssembliesAnnotation TracksAssociated ToolsPractical Exercise

GenomeBrowserhttp://genome.ucsc.edu/

Background

Visualization of genomic dataBackground Graphical viewpoint on the very large amount of genomic sequenceproduced by the Human Genome Project.Human Genome: 3,156,105,057 bp Focus turned from accumulating and assembling sequences toidentifying and mapping functional landmarksGenetic markersGenesSNPsPoints of regulation Visualization of Next-generation-sequencing data

BackgroundClient-sideClient-serverIntegrative Genomics Viewer*UCSC Genome Browser Application (Java) on the user’smachineApplication on a web-server; access viaweb browser Often difficult to installNo installation Does not have the extensive thirdparty data of the other browsersAccess to a very large database ofinformation in a uniform interface Much faster than web-based browsersOften difficult to import datasetshttp://www.broadinstitute.org/igv/

Background Intronerator was developed by J. Kent tomap the exon–intron structure of C. elegansRNAs mapped against genomic coordinatesJim Kent

Background Draft human genome sequence became available at the UCSC in 2000 Intronerator was used as the graphics engine3' UTRexon exon exon ex5' UTR

UCSC Genome Browser

GenomeBrowserhttp://genome.ucsc.edu/

Genome Assemblies Regular updates to genome assemblies toclose gaps in genomic sequence,troubleshoot assembly problems andotherwise improve the genome assemblies Shifting coordinates for known sequencesand a potential for confusion and erroramong researchers, particularly whenreading literature based on older versions. Frequently used assemblies hg18/hg19 New assemblies increase genomiccoverage 6-fold and have been depositedin GenBank. 127 genome assemblies have beenreleased on 58 organisms (April 2012)

Annotation tracks

Annotation tracks The database may contain any data that can bemapped to genomic coordinates and thereforecan be displayed in the Genome Browser Overview of tracks: http://genome.ucsc.edu/cgi-bin/hgTracks Three different categories: computed at UCSC computed elsewhere and displayed at UCSC computed and hosted entirely elsewhere

Annotation tracks computed at UCSC Comparative genomic annotations as well as Convert and liftOvercapabilities mRNAs and ESTs in GenBank are aligned to the reference assembly inseparate tracks (75 million GenBank RNAs and ESTs, 3 billion bases of thehuman reference assembly 2 CPU-years of computing time) The Conservation composite track displays the results of the multizalgorithm that aligns the results from up to 46 pairwise Blastz alignmentsto the reference assembly (e.g. hg19 human assembly consumed 10 CPUyears)

Annotation tracks computed elsewhere anddisplayed at UCSCAnnotations that are not post-processed by the UCSC Probe sets for commercially available microarrays, copy-number variation from tDatabase of Genomic Variants or expression data from the GNF Expression Atlas Data Coordination Center for the ENCODE project allowing access to a large numof functional annotations in regards to gene regulationAnnotations that are post-processed by the UCSC dbSNP (Common SNPs, Flagged SNPs, Mult. SNPs) OMIM (OMIM Allelic Variant SNPs, OMIM Genes, OMIM Phenotypes)

Annotation tracks computed and hostedelsewhere Data tracks are hosted remotely (no dataare stored at UCSC) and publiclyavailable, e.g. Epigenomics Roadmapproject http://epigenome.wustl.edu/

Tracks from the Epigenome project

Associated Tools Tools other than the main graphic imageaccount for 42% of traffic on the UCSCserver

Sessions

Custom track

Table Browser

Introduction to the UCSC genome browser Dominik Beck NHMRC Peter Doherty and CINSW ECR Fellow, Senior Lecturer Lowy Cancer Research Centre, UNSW and Centre for Health Technology, UTS . Nature Education 1(1). A T C G C G. Structure of human genome Annunziato A. 2008. DNA packaging: Nucleosomes and chromatin. Nature Education 1(1).

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

khullett@soe.ucsc.edu Sri Kurniawan Computer Engineering University of California, Santa Cruz Santa Cruz, CA 95064 srikur@soe.ucsc.edu Noah Wardrip-Fruin Expressive Intelligence Studio University of California, Santa Cruz Santa Cruz, CA 95064 nwf@soe.ucsc.edu ABSTRACT As game design programs become more common,

The human genome is the first genome entirely sequenced. b. The human genome is about the same size as the genome of E. coli. c. Researchers completed the genomes of yeast and fruit flies during the same time they sequenced the human genome. d. The sequence of the human genome was completed in June 2000. 10.