Three Data Delivery Cases For EMBL- EBI’s Embassy

2y ago
21 Views
2 Downloads
3.07 MB
26 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Macey Ridenour
Transcription

Three data delivery cases for EMBLEBI’s EmbassyGuy Cochranewww.ebi.ac.uk

EMBL European Bioinformatics InstituteGenes, genomes & variation European Nucleotide Archive1000 GenomesEnsemblEnsembl GenomesEnsembl PlantsEuropean Genome-phenome ArchiveMetagenomics portalGWAS Catalog browserProtein sequences InterProPfamUniProtMolecular structures Protein Data Bank in EuropeElectron Microscopy Data BankExpression Literature & ontology Europe PubMed CentralGene OntologyExperimental FactorOntologyArrayExpressExpression AtlasMetabolightsPRIDEChemical biology ChEMBLChEBIReactions, interactions& pathwaysSystems IntActReactomeMetaboLightsBioModelsEnzyme PortalBioSamples

Sequence data at tAlignmentEuropean Genome-phenome ArchiveAssemblyAnnotationEuropean Nucleotide Archive-Unrestricted dataPan-species and application-http://www.ebi.ac.uk/ena/-Controlled access dataHuman data around molecular medicine-http://www.ebi.ac.uk/ega/

Sequence data at tAlignmentEuropean Genome-phenome ArchiveAssembly-Controlled access dataHuman data around molecular medicine-http://www.ebi.ac.uk/ega/Infrastructure provisionAnnotationEuropean Nucleotide Archive-Unrestricted dataPan-species and application-http://www.ebi.ac.uk/ena/-BBSRC: RNAcentral, MG PortalMRC: 100k Genomes dataimplementationEC: COMPARE, MicroB3, ESGI,BASISetc.

Challenges Data have high volume and grow rapidly Data are dynamic (continuous feed) and their applicationhas urgency Users require arbitrary and ad hoc access

Tara Oceans

Tara OceansCapacity

Infectious disease Opportunity: A methodological revolution in clinical and public healthtowards shotgun sequencing-based methods Scientific power: Sequence harbours rich information Diagnostic: identification, typing, resistance profiling, etc. Public health: outbreak detection, response strategy, vaccine development Mechanistic: host interactions, pathogencity, virulence, transmission, antimicrobial resistance Global Microbial Identifier: Initiativewith EMBL-EBI involvement supportingtechnologies, standards and datasharing for pathogen surveillanceCOMPARE: recently launchedHorizon 2020 project in whichEMBL-EBI is informatics providerInformatics roles for EMBL: COMPARE: Rapid global sharing of surveillance and outbreak data,systematic integrated analysis, compute provision (Embassy) Standards for reporting, analysis and the communication of results New algorithms and analysis methods User interfaces for surveillance data reporting , across the domains

COMPARE dataTypingPrivatedataWorkflowintegraHonAPIEBI infrastructureEmbassy infrastructureDTU infrastructureEmbassy virtual �toolsAPI

COMPARE dataPrivatedataAPIEBI infrastructureEmbassy infrastructureDTU infrastructureEmbassy virtual workflowdevelopment‘Default’toolsAPI

Personalised medicine Motivation: Personalised studies of variation, cancer mutation,epigenetics, regulation, expression require references for comparisonand interpretation As part of GA4GH, EMBL-EBI is working on Resources serving reference human genomic and transcriptomic data,including Google read API, variant ‘Beacons’, etc. CRAM compression supporting greater data fluidity and APIs to allow directcomputational access Delivery and synchronisation of high volume datasets to local Embassy andremote cloud infrastructures Past and current FP7 projects include SLING, BASIS, ESGI

Personalised medicine Motivation: Personalised studies of variation, cancer mutation,epigenetics, regulation, expression require references for comparisonand interpretationArbitrary access As part of GA4GH, EMBL-EBI is working on Resources serving reference human genomic and transcriptomic data,including Google read API, variant ‘Beacons’, etc. CRAM compression supporting greater data fluidity and APIs to allow directcomputational access Delivery and synchronisation of high volume datasets to local Embassy andremote cloud infrastructures Past and current FP7 projects include SLING, BASIS, ESGI

ENA conventional read data deliveryConventionalinfrastructure(FTP, Aspera, GridFTP)ENAmetadataFIRE1ENA data(NFS)

ENA Embassy read data deliveryConventionalinfrastructure(FTP, Aspera, GridFTP)ENAmetadataFIRE2(Cleversafe)HTTPFUSEENA data

ENA Embassy read data deliveryConventionalinfrastructureEmbassy cloud infrastructure(VMWare - OpenStack)(FTP, Aspera, GridFTP)MarinecacheTara Oceans EmbassyPathogencacheCOMPARE EmbassyCRAMcacheGA4GH EmbassyENAmetadataFIRE2(Cleversafe)HTTPFUSEENA data

ENA external read data delivery phase II

EMBL-EBI Embassy CloudSteven NewhouseHead of Technical Services

The Challenge Facing EMBL-EBI Volume and variety of genomic data expanding EMBL-EBI data doubling every year - replication is challenging Infrastructure currently 50,000 CPUs & 60 PB Need to support complex analysis scenarios Web and programmatic access to services (3M unique users) Access to both public and managed access data sets Bespoke workflows and tools across a variety of domains Hard for users to replicate data sets for local analysis Use the ‘cloud’ to bring local analysis to EMBL-EBI data18

EMBL-EBI Embassy Cloud Service hosted at EMBL-EBI data centres Direct network access to public and managed data sets Direct network to access public services Expect both academic and commercial users Technical Implementation Logically isolated outside EMBL-EBI’s LANs Secure flexible infrastructure for both tenant and host Resources exposed using VMware’s vCloud Director & OpenStack Provide isolated IaaS clouds to multiple users19

Why ‘Embassy’ Cloud? An embassy is sovereign territory in a host country Host Country: EMBL-EBI Data Centre Sovereign Territory: Host Country not allowed to enter Virtualisation provides the protection for ‘tenant’ and ‘host’ Host puts boundaries in place to protect it from the tenant Tenant has freedom and control within those boundaries20

21Private DataEmbassy Cloud 3Embassy Cloud 2Embassy Cloud 1Managed DataPublic ServicesPublic DataEmbassy Cloud ConceptPanCancerVirtualised EMBL-EBI Hardware

User Benefits for the IaaS Model Tenant organisations get an empty virtual infrastructure They establish their own virtual machines and networks System administration performed by the tenant EMBL-EBI staff have no access to the VMs Added value from EMBL-EBI over other clouds Machines and data hosted in known jurisdiction Direct network data sets (public & managed access) Direct network access to public EMBL-EBI services22

Benefits to EMBL-EBI of the IaaS Model A secure collaborative workspace Work does not contend with main EMBL-EBI resources Clearly define the committed IT resources and data Explore how to build more data focused analysis services Move the analysis to where the big data is located Learn from and inform other big data scientific communities23

Embassy Cloud: Typical Uses Collaborative Environment Neutral ground outside internal network CTTV: Resources and VMs to host intranet, databases, Data Staging Undertake submission from local machine (following datastaging) rather from remote location BRAEMBL: Remote submission unreliable due to file upload Data Analysis Large scale management and analysis of data PanCancer: 1,000 cores, 2.5 TB RAM, 1.0 PB HDD

Issues Object Store Storage Infrastructure Essential for scalable high-performance storage Applications need to adapt to flat model Current caching strategy will have a limit Sharing resources between sites/communities/clouds Adopt a standards based model for federating resources Solutions for uploading and distributing VMs ( containers?) Replicating large data sets to ‘attract’ workloads to a cloud25

Gaps à Activities à Solutions? Data Set Replication Strategic pre-positioning of data into clouds Leverage JANET/GEANT, GridFTP Globus Transfers, Cloud federation for mobile computing EGI has a federated cloud and VM distribution model ELIXIR plans to build on existing infrastructure where possible Wide-area file access needed for collaborative data analysis High performance wide-area object-store Need access control for human related data Coordinated investment in infrastructure Where is the UK coordination? What coordination is needed? Integrating commercial resources where they add value Integration with EU Infrastructure (ELIXIR)26

Motivation: Personalised studies of variation, cancer mutation, epigenetics, regulation, expression require references for comparison and interpretation As part of GA4GH, EMBL-EBI is working on Resou

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

Retail. Big data use cases 4-8. Healthcare . Big data use cases 9-12. Oil and gas. Big data use cases 13-15. Telecommunications . Big data use cases 16-18. Financial services. Big data use cases 19-22. 3 Top Big Data Analytics use cases. Manufacturing Manufacturing. The digital revolution has transformed the manufacturing industry. Manufacturers

och krav. Maskinerna skriver ut upp till fyra tum breda etiketter med direkt termoteknik och termotransferteknik och är lämpliga för en lång rad användningsområden på vertikala marknader. TD-seriens professionella etikettskrivare för . skrivbordet. Brothers nya avancerade 4-tums etikettskrivare för skrivbordet är effektiva och enkla att

Den kanadensiska språkvetaren Jim Cummins har visat i sin forskning från år 1979 att det kan ta 1 till 3 år för att lära sig ett vardagsspråk och mellan 5 till 7 år för att behärska ett akademiskt språk.4 Han införde två begrepp för att beskriva elevernas språkliga kompetens: BI