SADI For GMOD: Semantic Web Services For Model Organism .

3y ago

16 Views

2 Downloads

2.09 MB

6 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Ronan Garica

Report this link

Download PDF

Transcription

CSWS2011 Proceedings - PosterSADI for GMOD: Semantic Web Services forModel Organism DatabasesBen Vandervalk13 , Michel Dumontier2 , E Luke McCarthy1 , and Mark DWilkinson11James Hogg Research Centre, Heart Lung Institute, University of BritishColumbia2Department of Biology, Carleton University3ben.vvalk@gmail.comAbstract. Here we describe work-in-progress on the SADI for GMODproject (SADI: Semantic Automated Discovery and Integration; GMOD:Generic Model Organism Database), a distribution of ready-made Webservices that will bring additional model organism data onto the Semantic Web. SADI is a lightweight standard for implementing Web servicesthat natively consume and generate RDF, while GMOD is a widely-usedtoolkit for building model organism databases (e.g. FlyBase, ParameciumDB). The SADI for GMOD services will provide a novel mechanismfor analyzing data across GMOD sites, as well as other bioinformaticsresources that publish their data using SADI.Keywords: Semantic Web, Web services, SADI, GMOD, model organism databases, bioinformatics, sequence features1IntroductionOne of the most pervasive problems in bioinformatics is the integration of dataand software across research labs. While the prevailing method of sharing data isthrough centrally controlled repositories such as GenBank [6], manual curationof submissions imposes a bottleneck on the quantity and types of data thatcan be integrated. In addition, centralization also places limits on the types ofvisualization and analysis tools that can readily be used with the data.One prominent example of a system for integrating distributed biologicaldata is the Distributed Annotation System (DAS) [7]. A DAS server providesaccess to sequence annotations (also known as sequence features) via a RESTful [8] interface, and returns the annotations in a simple, standardized XMLformat. Client applications (e.g. genome browsers) that understand the DASprotocol and XML format are able to provide users with a unified view of sequence annotations from multiple sites. Nevertheless, DAS has its limitations.The XML datasets returned by DAS servers cannot be integrated without specialized software, and cannot be readily combined with other types of data (e.g.protein-protein interaction networks). In addition, the majority of bioinformatics analysis tools (e.g. BLAST) do not natively understand DAS, and thus theyrequire specialized conversion scripts in order to process data from DAS servers.70

CSWS2011 Proceedings - PosterIn this paper we describe work-in-progress on SADI for GMOD, a collectionof Semantic Web services that implement DAS-like functionality. The goal ofSADI for GMOD is to provide a more general solution for federating sequencedata that is compatible with the Semantic Web, and which facilitates automatedintegration with analysis software and other types of bioinformatics data. Toward this goal, we propose a standard model for representing sequence featuresin RDF/OWL. The services are implemented according to the SADI (Semantic Automated Discovery and Integration) standard, and are targeted towardmaintainers of GMOD (Generic Model Organism Database) sites. Additionalinformation about these two projects is provided in the following section.2Related ProjectsSADI (Semantic Automated Discovery and Integration) SADI [1] is alightweight standard for the implementation of Semantic Web services. Services adhering to the SADI recommendations natively consume and generate data in RDF form, and can be invoked by issuing an HTTP POSTto the service URL with an input RDF document as the payload. One ofthe principal strengths of SADI is that there are no specialized protocolsor messaging formats. The interfaces to each service – that is, the expectedstructure of the input and output RDF documents – are described by meansof a provider-specified input OWL class and output OWL class, respectively.Further details about SADI are given in [1].GMOD (Generic Model Organism Database) The GMOD project [2] isa popular collection of open source software which facilitates the construction of a model organism database and its associated website. The centralcomponent of GMOD is a database schema called Chado [3], which housesa variety of datatypes such as sequences, sequence features, controlled vocabularies, and gene expression data. Scripts are provided for creating andloading a Chado instance as a Postgres database.3ServicesSADI for GMOD consists of five services which provide fundamental operationsfor accessing sequence feature data, as shown in Table 1. A sequence feature isan annotated region of a biological sequence (DNA, RNA, or amino acid) suchas a gene, an exon, or a protein domain. Related features are accessible througha hierarchy of parent-child relationships, and the GMOD wiki provides a set ofrecommendations [3] indicating where particular feature types should be locatedin the hierarchy. For example, the GMOD conventions assert that a gene shouldbe a child feature of a chromosome and that an mRNA transcript should be achild feature of a gene. The relationship connecting the parent and child featurewill be either “has part” or “derives into”, depending on whether the featuresare spatially or temporally related. For instance, the relationship between achromosome and a gene is “has part”, whereas the relationship between a geneand a transcript is “derives into”.71

CSWS2011 Proceedings - Poster72Table 1. A functional description of the five SADI services implemented by the SADIfor GMOD project. The fundamental input/output datatypes are genomic coordinates,feature descriptions, and database identifiers; further details about the representationof these entities is given in the following section.4Service NameInputRelationshipOutputget feature infoa database identifieris abouta feature descriptionget features overlappingregiona set of genomic coordinatesoverlapsa collection of featuredescriptionsget sequence for regiona set of genomic coordinatesis representedbya DNA, RNA, or aminoacid sequenceget child featuresa feature descriptionhas part /derives intoa collection of featuredescriptionsget parent featuresa feature descriptionis part of /derives froma collection of featuredescriptionsProposal for Modeling Sequence Features in RDFThe implementation of the SADI for GMOD services is relatively straightforward. The main point of interest is how the data is modeled in RDF/OWL. Theentities that need to be modeled are feature descriptions, genomic coordinates,and database identifiers, as shown in Table 2.In Listing 1, we show an example feature description for a tRNA gene inDrosophila melanogaster, encoded in TURTLE format. The principal ontologyused for the encoding is SIO (Semantic Science Integrated Ontology) [4], whichprovides a large collection of properties for capturing mereological, temporal,and other types of relationships. In addition, features are typed using termsfrom the Sequence Ontology [5]. Some readers may initially balk at the apparentcomplexity and opacity of Listing 1; however, it is important to emphasize thatthe primary goal of the encoding is to facilitate automatic integration of data,whereas simplicity and human-readability are secondary considerations. Thereare several data modeling practices that, when understood, should help to clarifyListing 1:1. Distinct entities are always modeled as distinct nodes in the graph.In non-RDF formats (e.g. relational databases), it is easy to conflate relatedentities. For example, the sequence of a chromosome and the chromosomeitself are often thought of as the same entity. However, this is not preciselytrue; the sequence is an abstract string representation of one of the strandsof the chromosome. In order to facilitate accurate and automated processingof the data, it is often helpful to make such distinctions explicit. In Listing1, the tRNA gene has a ranged sequence position in relation to a sequencethat represents the minus strand of a chromosome.

CSWS2011 Proceedings - Poster73Table 2. The fundamental input/output datatypes of the SADI for GMOD services.EntityComponentsExamplefeature description a feature type a set of genomiccoordinates one or moredatabaseidentifiersLines 11.41 of Listing 1genomic coordinates a start position an end position a referencesequenceLines 17.23 of Listing 1database identifier a identifier type an identifierstringLines 14.15 of Listing 12. URIs are frequently opaque. Ontologies providers (e.g. OBI, GO, SO)assign numeric URIs to classes and relationships in their ontologies for tworeasons: i) the URIs can have labels in multiple languages, and ii) the labelscan be updated without requiring updates to dependent datasets.3. Literals are modeled as typed resources. It is simplest to representliterals in RDF as plain strings or numbers, with the type of the literal indicated by the XSD datatype (e.g. xsd:float). Here, literals are modeledas instances of a particular rdf:type (e.g. range:StartPosition), with theactual values being specified by the “has value” property (i.e. SIO 000300).This approach provides a more flexible typing mechanism and allows additional information such as provenance to be attached to the values.4. Database identifiers are modeled as typed string values. In Listing1, the feature URI http://lsrn.org/FLYBASE:FBgn0011935 has an attachedidentifier with an rdf:type of lsrn:FLYBASE Identifier and a value of“FBgn0011935”. This may seem redundant, as the URI already acts as aunique identifier for the feature. We have adopted the practice of attaching typed, string-encoded database identifiers to URIs in order to addressa common problem on the Semantic Web, namely the tendency of dataproviders to invent their own URI schemes. For example, the URI for UniProtprotein P04637 is alternatively represented on the Semantic Web as http://purl.uniprot.org/uniprot/P04637 (UniProt and LinkedLifeData), http://bio2rdf.org/uniprot:P04637 (Bio2RDF and Linked Open Drug Data), andhttp://lsrn.org/UniProt:P04637 (SADI). While the existence of multipleURIs for the same entity impedes data integration across sites, data providersoften create their own URI schemes so that the URIs will resolve to datasets

CSWS2011 Proceedings - Posteror webpages on their own sites. We propose attaching database identifiers toURIs as shown here, so that equivalent URIs can automatically be reconciledacross sites, while still allowing the URIs created by each provider to resolveto their own data.Listing 1. Example RDF encoding for a tRNA gene in Drosophila eature : http :// sadiframework . org / ontologies / GMOD / Feature . owl # .range : http :// sadiframework . org / ontologies / GMOD / R a n g e d S e q u e n c e P o s i t i o n . owl # .strand : http :// sadiframework . org / ontologies / GMOD / Strand . owl # .FlyBase : http :// lsrn . org / FLYBASE : .GB : http :// lsrn . org / GB : .lsrn : http :// purl . oclc . org / SADI / LSRN / .sio : http :// semanti cs c ie nc e . org / resource / .so : http :// purl . org / obo / owl / SO # .xsd : http :// www . w3 . org /2001/ XMLSchema # .FlyBase : FBgn0011935a so : SO 0001272 ; # ’ tRNA gene ’sio : SIO 000008 # ’ has attribute ’[ a lsrn : F LY B AS E I de n t i f i e r ;sio : SIO 000300 ’ FBgn0011935 ’ xsd : string ]; # p ’ has value ’sio : SIO 000008 # ’ has attribute ’[ a range : R a n g e d S e q u e n c e P o s i t i o n ;range : in relation to : minus strand ;sio : SIO 000053 # ’ has proper part ’[ a range : StartPosition ; sio : SIO 000300 2077634 ];sio : SIO 000053 # ’ has proper part ’[ a range : EndPosition ; sio : SIO 000300 2077707 ]] .GB : AE013599 # c h r o m o s o m e arm ’2R ’a so : SO 0000105 ; # ’ chromosome arm ’sio : SIO 000008 # ’ has attribute ’[ a lsrn : GB Identifier ;sio : SIO 000300 ’ AE013599 ’ xsd : string ] . # p ’ has value ’: plus stranda sio : SIO 000030 ; # o ’ sequence ’sio : SIO 000210 # ’ represents ’[ a strand : PlusStrand ;sio : SIO 000093 GB : AE013599 ] . # p ’ is proper part of ’: minus stranda sio : SIO 000030 ; # o ’ sequence ’sio : SIO 000210 # ’ represents ’[ a strand : MinusStrand ;sio : SIO 000093 GB : AE013599 ] . # p ’ is proper part of ’5Deploying the ServicesThe SADI for GMOD services are implemented as Perl CGI (Common GatewayInterface) scripts. There will be three main steps to deploy the services at aGMOD site:1. Set up a Bio::DB::SeqFeature::Store database. For performance reasons, the services do not query a Chado database directly, but instead usea Bio::DB::SeqFeature::Store database which must be loaded separately74

CSWS2011 Proceedings - Posterby the GMOD site maintainer. The most common scenario is to load the datafrom a set of GFF files into a mysql database; Bio::DB::SeqFeature::Storeprovides the bp seqfeature load.pl script for this purpose.2. Unpack the SADI for GMOD tarball in the cgi-bin directory. Thetarball will be unpacked into a SADI directory tree which will contain thePerl CGI scripts as well as the required Perl modules.3. Add database connection parameters to the SADI for GMOD configuration file. The configuration file will be located in the SADI subdirectory of cgi-bin.6ConclusionWhile the majority of existing biological Web services use XML for data exchange, SADI services use RDF/OWL in order to facilitate automatic integration of data across service providers. As such, the SADI for GMOD services willprovide a novel tool for conducting analyses across model organism databases,as well as other biological data sources and tools that are published using SADI.7AcknowledgementsInitial development of SADI and SHARE has been funded by a special initiatives award from the Heart and Stroke Foundation of British Columbia andYukon, with additional funding from Microsoft Research and an operating grantfrom the Canadian Institutes for Health Research (CIHR). In addition, corelaboratory funding has been supplied by the National Sciences and Engineering Research Council of Canada (NSERC). Development of SADI for GMOD,as well as hundreds of other SADI services, has been funded by a grant fromCanada’s Advanced Research and Innovation Network (CANARIE).References1. Wilkinson, M.D., Vandervalk, B.P., McCarthy E.L.: SADI Semantic Web Services cause you cant always GET what you want! Services Computing Conference (APSCC) 2009, 13-18 (2009)2. GMOD homepage, http://gmod.org3. Introduction to Chado, GMOD Wiki, http://gmod.org/wiki/Introduction to Chado4. Semantic Science on Google Code, http://code.google.com/p/semanticscience/5. Eilbeck, K., Lewis, S.E., Mungall, C.J., et al.: The Sequence Ontology: a tool forthe unification of genome annotations. Genome Biology 6:5 (2005)6. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., et al.: GenBank. Nucleic AcidsResearch 36, D25-D30 (2008)7. Dowell, R.D., Jokerst, R.M., Day, A. and et al.: The Distributed Annotation System.BMC Bioinformatics 2:7 (2001)8. Fielding, R.T.: Architectural styles and the design of network-based software architectures. University of California, Irvine (2000)75

a variety of datatypes such as sequences, sequence features, controlled vo-cabularies, and gene expression data. Scripts are provided for creating and loading a Chado instance as a Postgres database. 3 Services SADI for GMOD consists of ve services which provide fundamental operations for accessing sequence feature data, as shown in Table 1.

Related Documents:

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

375 Views

1y ago

Max Völkel, Elena Simperl Wiki meets Semantic Web ...

WibKE – Wiki-based Knowledge Engineering @WikiSym2006 Our Goals: Why are we doing this? zWhat is the semantic web? yIntroducing the semantic web to the wiki community zWhere do semantic technologies help? yState of the art in semantic wikis zFrom Wiki to Semantic Wiki yTalk: „Doing Scie

46 Views

2y ago

10 tips och tricks för att lyckas med ert sap-projekt

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

736 Views

2y ago

Nordens 25 största medieföretag efter omsättning

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

333 Views

1y ago

SS 02 52 68 Ljudklassning av utrymmen i byggnader - byggtjanst.se

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

357 Views

1y ago

Apple Developer Program License Agreement (Swedish)

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

344 Views

1y ago

Agent Based Personalized Semantc Web Information Retrieval ...

A. Personalization using Semantic web: Semantic technologies promise a next generation of semantic search engines. General search engines don’t take into consideration the semantic relationships between query terms and other concepts that might be significant to the user. Thus, semantic web vision and its core ontology’s are used to .

40 Views

3y ago

Am I My Brother’s Keeper? - Maestro Arts

Am I My Brother's Keeper? is a project by British artist Kate Daudy, who has transformed a large UNHCR tent; previously home to a Syrian refugee family in Jordan’s Za’atari camp into a participatory art installation focussing on the concepts of home and identity. During the year and a half she spent researching the project, Daudy visited refugee camps in Jordan. There and across Europe and .

82 Views

3y ago

Recent Views

Finance Management for Schools Bromcom eFinance, powered .

eFinance. The Bromcom Financial Accounting System (FAS) is a purpose designed configuration of one of the world's leading financial management solutions now available to UK maintained schools, academies and multi academy trusts (MATs). Known as eFinance, at its core is a suite of modules from Unit4 Business World.

1y ago

104 Views

eFinance Budget Entry Schools

eFinance Plus Entry The boom poron of the "Expendi ture Budget Process" window will be accessible on your screen. You are now ready to enter your budget for next ﬁscal year. Enter the amount you want to allocate for your next ﬁscal year's budget in the Requested

1y ago

122 Views

Siebel eFinance for Teller Connector to IBM WebSphere .

12 Siebel eFinance for Teller Connector to IBM WebSphere Business Component Composer Guide Version 7.0, Rev. H Siebel Teller Architecture The Siebel Connector for Teller extends the functionality of the Siebel Connector for IFX XML to provide Teller-specific data exchange between Siebel and other systems.

1y ago

101 Views

Siebel eFinance ガイドバージョン6.0

siebelﬁ ebusiness applications siebel efinance ガイド siebel 2000 バージョン6.0.2 2000 年7 月 6jpa1-fb00-06020 sfsbank.book 1 ページ 2001年5月29日火曜日午後5時42分

1y ago

97 Views

1 2 4 5 7 8 9 10

The eFinance Plus Accounting, Human Resources and Payroll System are supported by D&N. This system is an online interactive package designed to handle all phases of K-12 school business. ESU#3/D&N is supporting a new time clocking system called Time Clock Plus. This clocking system will integrate with eFinance Plus as well

1y ago

105 Views

IHRE ONLINE FINANZIERUNG: eFINANCE

eFinance bietet Ihnen die Möglichkeit, Finanzierungs-produkte ab sofort ganz einfach online zu beantragen. In einem transparenten und strukturierten Prozess können Sie die notwendigen Dokumente sicher übermitteln, mit uns verhandeln, und auch elektronisch unterzeichnen. Außerdem können Sie mit Ihrem Kunden-

1y ago

144 Views

Relatório Anual 2014

Prêmio efinance 2014 O Sicredi foi o vencedor da categoria Plataforma de Canais do XIII Prêmio efinance com o case Plataforma Multicanal. A Plataforma Multicanal foi desenvolvida para renovar a tecnologia utilizada nos canais de relacionamento da instituição financeira cooperativa com os associados. Julho

1y ago

102 Views

eFinance Travel Voucher Guide - National Defense University

filling out the travel voucher (CONUS-CONUS). - If it is your current address, check the box. 5 America’s Airmen Dependents - Add all dependents. - If the individual will be claimed on the voucher, click “auto-claim this dependent” before adding them. 6 America’s Airmen

2y ago

102 Views

E-Finance in the Philippines: Status and Prospects for Digital .

the role of digital technology in financial inclusion has not been studied in detail. There has been very limited information available in the existing literature that examines the role of efinance in achieving the objective- of inclusive growth. This paper is an attempt to study the contribution of technology towards financial inclusion in

1y ago

100 Views

Wiener Processes and Ito's Lemma - efinance .cn

Categorization of Stochastic Processes Discrete time; discrete variable Random walk: if can only take on discrete values Discrete time; continuous variable

1y ago

102 Views

AIC eServices for Financing Schemes (eFASS) Navigation Guide

Schemes (eFASS) platform at https://eFinance.aic.sg For detailed steps, refer to page 3 of this navigation guide. Yes, you can apply on behalf of someone in your family.

1y ago

113 Views

2016-2017 Financial Services Guidelines

Receiving POs in eFinance 41 Staff Travel 42 Student Travel 43 Accounts Payable Forms and Instructions 43 TRAVEL PROCEDURES GUIDELINES 45 Required Documentation and Steps 46 Step 1 - Conference Approval 46 Step 2 - Conference Requisition Request 46 . 4 Step 3 - Conference Purchase Order/Payment 46 .

1y ago

103 Views

Introducing the New and Revised Data Points in HMDA

added two e numerations ( "cash -out r efinance" an d " other p urpose") t o Loan P urpose, an d s plit the "non-owner o ccupied" category o f Occupancy Type i nto " se cond r esidence" a nd " in vestment propert y." In ad dition, un der t he 20 15 H MDA R ule, ap plicants h ave t he o ption t o s elf -identify

1y ago

93 Views

Data Point: 2018 Mortgage Market Activity and Trends

The number of r efinance o riginations declined from 2.5 million in 2017 to 1.9 million in 2018. The number of reported home improvement loans declined from 549 ,000 in 2017 to 183,000 in 2018 , a drop that resulted primarily from a change in reporting requirements that excluded unsecured home improvement loans . 5

1y ago

96 Views

Ankeny Community Schools 306 Sw School St. Fixed Asset Inventory and .

reconciliation. ACSD is currently using the Fixed Assets Module of eFinance Plus software to track assets. Vendor will perform all labor to conduct a comprehensive inventory at ACSD site locations. During the inventory process, all of the following information will be captured for each item Asset Identification Information

1y ago

121 Views

SADI For GMOD: Semantic Web Services For Model Organism .

It looks like you're using an ad-blocker