FEMwiki: Crowdsourcing Semantic Taxonomy And Wiki Input To .

2y ago
16 Views
2 Downloads
865.03 KB
8 Pages
Last View : 5m ago
Last Download : 3m ago
Upload by : Joao Adcock
Transcription

FEMwiki: crowdsourcing semantic taxonomy and wiki input todomain experts while keeping editorial control:Mission Possible!Patty KostkovaVladimir PrikazskyArnold BosmanUniversity College LondonLondonUKEuropean Centre for DiseasePrevention and Control (ECDC)SwedenEuropean Centre for DiseasePrevention and Control @ecdc.europa.eu Arnold.Bosman@ecdc.europa.euABSTRACTHighly specialized professional communities of practice (CoP)inevitably need to operate across geographically dispersed area members frequently need to interact and share professionalcontent. Crowdsourcing using wiki platforms provides a novelway for a professional community to share ideas and collaborateon content creation, curation, maintenance and sharing. This isthe aim of the Field Epidemiological Manual wiki (FEMwiki)project enabling online collaborative content sharing andinteraction for field epidemiologists around a growing trainingwiki resource.However, while user contributions are the driving force forcontent creation, any medical information resource needs to keepeditorial control and quality assurance. This requirement istypically in conflict with community-driven Web 2.0 contentcreation. However, to maximize the opportunities for thenetwork of epidemiologists actively editing the wiki contentwhile keeping quality and editorial control, a novel structure wasdeveloped to encourage crowdsourcing – a support for dualversioning for each wiki page enabling maintenance of expertreviewed pages in parallel with user-updated versions, and aclear navigation between the related versions.Secondly, the training wiki content needs to be organized in asemantically-enhanced taxonomical navigation structure enablingdomain experts to find information on a growing site easily. Thisalso provides an ideal opportunity for crowdsourcing. Wedeveloped a user-editable collaborative interface crowdsourcingthe taxonomy live maintenance to the community of fieldepidemiologists by embedding the taxonomy in a training wikiplatform and generating the semantic navigation hierarchy on thefly. Launched in 2010, FEMwiki is a real world servicesupporting field epidemiologists in Europe and worldwide. Thecrowdsourcing success was evaluated by assessing the numberand type of changes made by the professional network ofepidemiologists over several months and demonstrated thatcrowdsourcing encourages user to edit existing and create newcontent and also leads to expansion of the domain taxonomy.Categories and Subject Descriptorsalgorithms, semantic web, realassessment, semantic navigationworldGeneral Termscrowdsourcing, semantic web, evaluation, field epidemiology,taxonomyKeywordsFEMwiki, social and Semantic Web, user engagement,evaluation1INTRODUCTIONIn many modern disciplines professional experts are oftenwidely geographically dispersed. It has became essential tomaintain a single repository of knowledge about the domainonline to avoid error-prone practices such as sendinginformation from person to person by email. Expertsincreasingly desire to be able to contribute to a shared repositoryusing web 2.0 technology such as wikis and crowdsourcing tothe community what traditionally was developed by expertscommittees. However, crowdsourcing shared professionalcontent development to real world users requires easy-to-usetools for domain experts to make their contributions. While thecollaborative aspect of developing the repository is important, itis also vital to ensure that the quality of the repository ismaintained. This required is unquestionably of paramountimportance in the medical domain. However, editorial controlsshould not stifle the pace of contribution to the portal.Therefore, it is important to provide user friendly Web 2.0 toolsfor experts to collaboratively maintain a knowledge repositoryonline, while at the same time, provide an editorial controlsystem that maintains quality, but does not interfere excessivelywith the process of updating the resource. Further, thecrowdsourced wiki content to potentially hundreds of usersneeds to be organized in a semantically-enhanced taxonomicalnavigation structure enabling domain experts to find informationon a growing site easily, one that is easy to maintain by domain

experts as the project grows. Both features, content creation andtaxonomy maintenance require active cooperation from thedomain experts.In this paper, we present the Field Epidemiology ManualWiki (FEMwiki) Framework crowdsourcing model enablingcollaborative editing of the actual content as well as navigationtaxonomy. FEMwiki (www.femwiki.com), funded by the ECDC(European Centre for Disease Prevention and Control), is used byfield epidemiologists to maintain a repository of knowledge usedfor training purposes. FEMwiki has its origins in a trainingmanual for the EPIET training course (European Program forIntervention Epidemiology Training), and was converted into anwiki-style repository using crowdsourcing.Biomedical Ontologies (OBO) [3], MeSH2, ICD-113 andSNOMED-CT4). In such domains, the ontologies are usuallyhighly formal but require a considerable amount of expertise,time and effort to build. Stevens and Jupp [4-5] argue that manyother medical ontologies are rather taxonomies as they do notfollow first order logic relationships between entities but betterdescribe the complex medical domain.FEMwiki framework is structured using a domain taxonomyeditable by users in the same way as the actual content. Thetaxonomy browser on the front page of the wiki allows users toimmediately see and navigate the organisation of the repository(Figure 1).Section 2 provides a background to the project and sets thescene for section 3 where we present the FEMwikicrowdsourcing framework for both wiki editing and semantictaxonomy development. In section 4, we discuss the evolution ofthe project and evaluation results with real world fieldepidemiologists. Section 5 brings discussion while section 6concludes.2BACKGROUND AND RELATEDWORKCrowdsourcing owns its growing popularity to the simple factthat a large number of users can make a small effort on a sharedtask enabling a large scale collaborative work performed easily[1]. Crowdsourcing has many forms and could be implementedover a number of platforms. Typically, collaborative Web 2.0technologies enable users to create and modify content in ashared repository instead of merely being passive consumers. Inaddition to sharing the work, the risk of bottlenecks is reduced.The most well known example of crowdsourcing must beWikipedia1 with over 4.7 million articles, being increased everyday with over 800 new articles as of March 2015. However, usercontributions remain sparse. Wikipedia has also been studied as acultural phenomena reaching trusted level of information throughcrowdsourcing [2].Large wikis such as Wikipedia can be difficult to navigate, asthey are large at repositories and there is no native support fororganising the content. However, for domain-specific wikis, thisproblem could be overcome by organizing the pages according tosemantic taxonomy representing the domain entities (also callednodes) forming the basis for content navigation using the parentchild relationship (entity becomes a wiki page). Semanticontologies and taxonomies are used in a wide variety ofdisciplines. Perhaps their most notable successes are in the lifesciences and medicine (for example, the Open Biological and1http://en.wikipedia.org/wiki/Main Page (English version)Figure 1. The taxonomy browser - FEMwiki front pageThe notion of using crowdsourcing to develop domaintaxonomies is attractive, as a large number of experts in thedomain can each make contributions (small or large), rather thana small number of experts expending a large amount of effort.The resulting taxonomy would (in theory) respect a consensusview on the domain, rather than the view of a smaller number ofexperts. However, despite the experts' efforts, the resulting web2.0 taxonomy often contains gaps and errors. Also the taxonomyis never final or static - there is a need to fill in gaps and modifythe taxonomy structure and maintain it as the domain develops.A good example of an ontology generated by crowdsourcing isDBpedia5 [6], where structured data is extracted fromWikipedia, and made available on the Web. In this case, thecontributors do not create the ontology directly, but make editsto Wikipedia, and are probably not aware that some of theircontributions are being used to create an ontology. Semanticwikis are more direct attempts to combine semantics withcrowdsourcing. The users are generally aware that they are alsocontributing structured data as well as human-readable text. Insemantic wikis, each page (also called an article) corresponds toan entity in the resulting ontology (either a class, individual, orproperty). The most widely used semantic wiki is perhapsSemantic MediaWiki (SMW)6, built on the MediaWiki platformthat is used for Wikipedia. However, attempts have been madeto provide easier-to-use tools usable by domain experts. ProjectHalo [7] is an extension to SMW that provides a semantictoolbar to allow page editors to annotate pages. However, thearrangement of pages into a class hierarchy is performed byplacing semantic annotations directly into the wikitext, which isprobably not ideal for most IT non-specialist nt/classi cations/icd/en/4 g(theontologyishttp://wiki.dbpedia.org/Ontology)6 http://semantic-mediawiki.org/3accessibleat

OpenDrugWiki [8] is a semantic wiki system that holds druginteraction data. The wiki pages can be edited by any registereduser, but to maintain quality the peer review mechanism isperformed by a set of editors manually checking all edits. Onlyapproved pages are made available for querying posing a clearissue with scalability. HJ Jung proposes quality assurance incrowdsourcing using matrix factorisation [9]. Further, the NeLI7project provides another example of domain experts-leddevelopment on infection taxonomy using SKOS/owl [10],however, in this case, assistance of the computer scienceresearchers was required to facilitate the process. SweetWiki [11]is a system with a WYSIWYG ontology editor, based onsemantic tagging using a wiki object model. However, unlikeSweetWiki, we use the actual wiki structure for developing thesemantic taxonomy. The importance of user-friendly interface tosemantic technologies as been highlighted by Madle et al [12]and Oliver [13]. Crowdsourcing intelligence for semantic hasbeen also argued for by Auer and Kontokostas [14].Therefore, designing a collaborative Web 2.0 wiki utilizingcrowdsourcing while keeping editorial control over qualityremains an issue. Further, engaging users in semantic navigationtaxonomy maintenance for domain wikis remains attractive but asuitable user-friendly interface is essential for unsupervisedcollaborative input form domain experts.In this paper we propose a solution to these two problems:the FEMwiki framework and conduct the initial evaluation. Userengagement has been a growing discipline with developedmodels for assessment [15] but methods for encouragementremain an open problem.3THE FEMWIKI CROWDSOURCINGFRAMEWORKFEMwiki consists of a wiki-based repository, user forumsfor discussion of wiki pages, and user personal profiles whereusers can give more information about themselves.The schematic organisation of the wiki part of FEMwiki isshown in Figure 2. Wiki pages - representing a term from thedomain of field epidemiology - may contain texts and graphics,and are organised into a hierarchical structure.In addition to navigating by following links to other wikipages, the main organisational feature is a navigation hierarchyof semantically connected parent-child wiki pages derived fromsemantic taxonomy of the epidemiology domain. Although thereis no strict meaning given to the parent-child relationshipbetween wiki pages in the actually platform, semanticallyconnected parent-child pages create a naturally formednavigation taxonomy representing the semantics of the domainknowledge. Each node in the taxonomy is a wiki page, which canhave text and graphics, as well as child pages. Additionalfeatures, such as tagging, may assign more terms to a single wikipage flexibly. Pages can also contain cross reference (untyped)hyperlinks to other pages anywhere in the wiki. In the frameworkit is not required that there is a single root node (there can be7http://www.neli.org.ukmultiple disjoint trees), but to avoid confusion we will assume inthe following that there is such a root node.The taxonomy browser is immediately visible on the front pageof the wiki giving users the opportunity to visualize the domaintaxonomical structure. This browser page is instantlyregenerated when changes are made allowing users to seeupdates instantly. More importantly, users can enjoy a seamlessexperience as taxonomy changes are done through the same wikiinterface as updates to normal wiki content. The taxonomy canbe altered by specifying a parent page while editing any wikipage (Figure 3). In particular, the position of a page in thehierarchy can be altered during editing, by specifying a newparent page (if no parent is specified, the page is placed at thetopmost level).The taxonomy and all wiki pages are viewable by the public.In order to edit a page, and to post in the forums, users mustcreate an account. As in other wiki systems, page histories arestored. While versioning is a typical feature of wikis, it has aspecial importance for the editorial control system, see in thefollowing section.3.1FEMwiki Dual Versioning EditorialModelWhile the system aims to encourage any registered user can addto or change a wiki page, it is important to ensure that thecontent can be trusted. In the FEMwiki framework, themechanism ensuring this is assigning an editorial role to seniordomain experts to approve specific pages and keep dual versionsystem in parallel. Editors are senior epidemiologists assignedby FEMwiki management at ECDC to ensure approved contentis of high quality, up-to-date and strictly evidence based.The expert-reviewed version of a page (Figure 4), isdisplayed with a colour- coded green bar at the top of the pagecontaining a link to the latest unapproved version (if one exists).The editor and other contributors to the page content are listedon the right hand side. The latest unapproved version of a page(Fig 4) has a yellow bar at the top, with a link to the expertreviewed version (if one exists). The editor of the page canapprove this version by clicking a button (not visible to anyother readers).The page history is not affected by the dual editorialsupport. Only the latest version of a page can be edited. There isonly one expert version permitted at a given time (or none)Thus, if an expert-reviewed version of a page exists, togetherwith one or more later versions, only one of these later versionscould be later approved as the "new" expert-reviewed. Thus, thepage history that is stored is a sequence of pages, together withthe version number of the latest expert-reviewed page (Figure5).

Figure 2. A schematic diagram of the FEMwiki frameworkcontent organisation.Figure 4. The expert-reviewed and latest versions of a wikipage.The content on the right hand side of the wiki pagesillustrates the users involved in the editorial process, thus addingan extra layer of transparency and trust, unusual in other wikiprojects.Figure 3. The position of a page in the hierarchy alteredduring editing.Figure 5. The page history at FEMwiki framework,illustrating expert-reviewed and unapproved pages.

3.2FEMwiki Semantic Navigation ModelOne of the main challenges of implementation of semantictechnologies is the cost of development and maintenance ofdomain ontologies and taxonomies. This is of particularimportance in life critical domains where the need to keep theontology up to date is paramount. User-friendliness of ontologyeditors is another challenge. In the FEMwiki framework, weutilized the wiki user interface users have been using forcollaborating editing of the medical content for entirely differentpurpose: the wiki page also serves as a user-friendly taxonomyeditor, thus, offering a seamless experience to users.Therefore, in order to elicit more edits from users the entire fieldepidemiology taxonomy is displayed on the navigation page(rather than just pages with existing content). A colour coding isused to draw user attending to empty pages ("stubs") and todistinguish between various types content, see Figure 6. Thetaxonomy editor supports colour-coding for the dual versioningof pages: (A) YELLOW: link to the latest version of the page (B)GREEN: link to the expert-reviewed (and approved) page,clicking on the text ‘approved version" will lead to the reviewedversion. Further, (C) QUESTION MARK: pages that do not havean expert-reviewed version (indicated by the question markicon), the link will lead to the latest version, and finally, (D)GREEN ONLY: indicates pages where the latest version is alsothe expert-reviewed version, the link leads to this commonversion. Any edits to the page will cause a new latest version tobe created. Finally, (E) RED illustrates (and visually drawsattention to) to pages tagged as “stubs" where content has notbeen developed yet.By simply looking at the colour-coded taxonomy browser, usercan see which pages have expert-reviewed versions, and caneither choose to see that version, or a later unapproved version ifone exists (Figure 6). The user can also see “stub" pages markedin red - this feature is specifically designed to highlight parts ofthe wiki content that need to be filled in, and to encourage usersto start this process.4THE FEMWIKI EVOLUTION ANDEVALUATION RESULTSAs outlined in the Introduction section, the Field EpidemiologyManual Wiki (FEMwiki) [16], funded by the ECDC (EuropeanCentre for Disease Prevention and Control), is used by fieldepidemiologists to maintain a repository of knowledge used fortraining purposes. It was developed and hosted by City ehealthResearch Centre until January 2012, and was subsequentlymigrated to ECDC. FEMwiki was developed using TelligentCommunity software8 (which provides typical wiki functions,such as editing, and conflict resolution). The FEMwiki servesprimarily field epidemiologists in Europe - this community ofpractice was investigated by Fowler et al [17].4.1The evolution of the FEMwiki contentThe basis of the FEMwiki was the a training manual developedby a training programme run by ECDC, EPIET, which wasorganised into 17 chapters. Each chapter was originally writtenby a trainer/lecturer in the EPIET programme - the manual wasintended to be studied and also taught like a textbook, from startto finish. During the process of converting the manual toFEMwiki, an editorial board was appointed to oversee theprocess of reviewing each chapter and converting it into a wikipage(s) [16]. Where possible, these were the original lecturers,otherwise, new senior experts were appointed.The first version of FEMwiki retained the chapter sequence ofthe EPIET manual, with a home page for each chapter. Thesemantic nature of the FEMwiki platform (i.e. the taxonomybrowser) was not utilised at this initial stage (see Figure 7-1).Utilising the FEMwiki framework potential for semantictaxonomic representation and navigation provided anopportunity to develop a taxonomy of public health and fieldepidemiology not covered by existing medical taxonomies. Inorder to organise the content, a taxonomy was developed inconsultation with domain experts in Stockholm and Londonwhich attempted to cover the knowledge required to train fieldepidemiologists (Figure 7-2).The navigation taxonomy has undergone a major developmentduring the project to enhance the simplicity and actively engageusers. A noticeable feature of the resulting taxonomy is that it isstill somew

wiki-style repository using crowdsourcing. FEMwiki framework is structured using a domain taxonomy editable by users in the same way as the actual content. The taxonomy browser on the front page of the wiki allows users to immediately see an

Related Documents:

ing in democratic processes, crowdsourcing as a part of Open Government practices, and the impact of crowdsourcing on democracy. Chapter 5 outlines the factors for success - ful crowdsourcing. Chapter 6 discusses the challenges of crowdsourcing. Chapter 7 gives policy recommendations for enhancing transparency, accountability and citizen par-

In Section 2, we discuss the importance of and need for a reference model for crowdsourcing, which is the essence of this study. In Section 3, we present our research method and the steps we took in order to construct a taxonomy model for crowdsourcing. In Section 4, we discuss the main four pillars of crowdsourcing

discussed by Afuah, A. and Tucci, C. "Crowdsourcing as a Solution to Distant Search," Academy of Management Review (37:3), 2012, pp. 355-375. 8 The different modes of crowdsourcing are discussed in Zhao, Y. and Zhu, Q. "Evaluation on crowdsourcing research: Current status and future direction," Information Systems Frontiers, 2012.

crowdsourcing approach is able to achieve annotations with 61% of accuracy, up to 4 annotations per hour. Given that these results were acquired from one elevator, this practice can be a promising method of eliciting annotations from on-site participants. Keywords—annotation; audio annotation; crowdsourcing; local crowdsourcing; Raspberry Pi I.

(semantic) properties of objects to place additional constraints on snapping. Semantic snapping also provides more complex lexical feedback which reflects potential semantic consequences of a snap. This paper motivates the use of semantic snapping and describes how this technique has been implemented in a window-based toolkit. This

tive for patients with semantic impairments, and phono-logical tasks are effective for those with phonological impairments [4,5]. One of the techniques that focus on semantic impair-ments is Semantic Feature Analysis (SFA). SFA helps patients with describing the semantic features which ac-tivate the most distinguishing features of the semantic

Semantic Analysis Chapter 4 Role of Semantic Analysis Following parsing, the next two phases of the "typical" compiler are –semantic analysis –(intermediate) code generation The principal job of the semantic analyzer is to enforce static semantic rules –constructs a syntax tree (usua

Anurag Naveen Sanskaran Hindi Pathmala –Part-8 Orient BlackSwan Pvt Ltd. 2. Vyakaran Vyavahar – 8 Mohit Publications. 3. Amrit Sanchay (Maha Devi Verma) Saraswati House Publications COMPUTER 1. Cyber Tools – Part 8 KIPS Publishing World C – 109, Sector – 2, Noida. Class: 9 Subject Name of the Book with the name and address of the Publisher SCIENCE 1. NCERT Text Book For Class IX .