In-Situ Labeling For Augmented Reality Language Learning

3y ago

23 Views

2 Downloads

457.45 KB

6 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Genevieve Webb

Report this link

Download PDF

Transcription

In-Situ Labeling for Augmented Reality Language LearningBrandon Huynh*Jason Orlosky†Tobias Höllerer‡University of California, Santa BarbaraOsaka UniversityUniversity of California, Santa BarbaraFigure 1: Images showing a) our object registration algorithm, which uses a set of uncertain candidate object positions (in red) toestablish consistent labels (in green) of items in the real world b) a view directly through the HoloLens of resulting labels from ourmethod in a previously unknown environment, and c) a photo of a user wearing the system and calibrated eye tracker used for labelselection.A BSTRACTAugmented Reality is a promising interaction paradigm for learningapplications. It has the potential to improve learning outcomes bymerging educational content with spatial cues and semantically relevant objects within a learner’s everyday environment. The impact ofsuch an interface could be comparable to the method of loci, a wellknown memory enhancement technique used by memory championsand polyglots. However, using Augmented Reality in this manneris still impractical for a number of reasons. Scalable object recognition and consistent labeling of objects is a signiﬁcant challenge,and interaction with arbitrary (unmodeled) physical objects in ARscenes has consequently not been well explored. To help addressthese challenges, we present a framework for in-situ object labeling and selection in Augmented Reality, with a particular focus onlanguage learning applications. Our framework uses a generalizedobject recognition model to identify objects in the world in realtime, integrates eye tracking to facilitate selection and interactionwithin the interface, and incorporates a personalized learning modelthat dynamically adapts to student’s growth. We show our currentprogress in the development of this system, including preliminarytests and benchmarks. We explore challenges with using such a system in practice, and discuss our vision for the future of AR languagelearning applications.Index Terms: Human-centered computing — Mixed and augmented reality; Theory and algorithms for application domains —Semi-supervised learning;1I NTRODUCTIONFor many years, learning new words has often been accomplishedby memorization techniques such as ﬂash cards and phone or tabletbased applications. These often use temporal spacing algorithmsto modulate word presentation frequency such as Anki [11] andDuolingo [32]. A more effective, albeit time consuming, method* e-mail:† -u.ac.jp‡ e-mail: holl@cs.ucsb.edu1606of language learning is to attach notes with words and illustratedconcepts to real world objects in a familiar physical space, takingadvantage of the learner’s capacity for spatial memory. Learnersconstantly see a particular object, recall the associated word andlearn that concept more effectively since the object is in its naturalcontext and is consistently viewed over time. This type of learningis also referred to as the method of loci [4, 23, 33].Our goal is to replicate this in-situ learning process, but to doso automatically and with the support of augmented reality (AR),as represented in Fig. 1 b. In other words, when a user views anobject, we want to automatically display the concept(s) associatedwith that object in the target language and provide a method for boththe viewing and selection of a particular term or concept. Deployingsuch an interface in a real-world, generalized context is still a verychallenging task.As a step towards this goal, we introduce a more practical framework that can function as a cornerstone for improving in-situ learningparadigms. In addition to the process of trial and error to ﬁnd a moreeffective and practical approach to designing such a system, ourcontributions include:1. a client-server architecture that allows for real-time labellingof objects in an AR device (Microsoft HoloLens),2. a description and solution to the object registration problemresulting from the use of real-time object detectors (Fig. 1 a),3. a practical framework for exploring challenges in the implementation of AR language learning, and a discussion of novelinteraction paradigms that our framework enables.The practical use of this system can enable in-situ learning forlanguages, physical phenomena, and other new concepts.2 R ELATED W ORKPrior work falls into three primary categories, 1) the implementationof object recognition, semantic modeling, and tracking for in-situlabeling, 2) view management techniques for labeling in AR, and 3)the use of AR and VR to facilitate learning of concepts and language.While all of these three categories are typically different areas ofresearch, they are each essential for the effective implementation ofin-situ AR language learning.2019 IEEE Conference on Virtual Reality and3D User Interfaces23-27 March, Osaka, Japan978-1-7281-1377-7/19/ 31.00 2019 IEEEAuthorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on March 09,2021 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.

2.1Object Recognition and Semantic ModelingReal-time object detection is a fairly new development, and there arenot many works discussing the integration of these technologies intoan augmented reality system. Current detection approaches utilizeobject recognition in 2D image frames, using learning representations such as Deep and Hierarchical CNNs and Fully-ConnectedConditional Random Fields [6, 20], or, for fastest real-time evaluation performance just a single neural network applied to the entireimage frame [28]. Combined 2D/3D approaches [1, 21] or objectdetection in 3D point cloud space [7, 27] may become increasinglyfeasible for real-time approaches in the not-too-far future as more3D datasets [1, 7] become available, but currently, approaches thatapply 2D object detection to the 3D meshes generated by AR devicessuch as HoloLens or MagicLeap One yield better performance.Huang et al. [13] compare the general performance of 3 popularmeta architectures for real-time object detection. They show thatthe Single Shot Detector (SSD) family of detectors, which predictsclass and bounding boxes directly from image features, has the bestperformance to accuracy tradeoff. This is compared to approacheswhich predict bounding box proposals ﬁrst (Faster-RCNN and RFCN). We experimented with the performance of both types ofdetectors and ultimately settled on an implementation of SSD.The most recent and closest work to our approach is that ofRunz et al. [29] in 2018. Using machine learning and an RGBDcamera, they were able to segment the 3D shapes of certain objectsin real time for use in AR applications. Their approach utilized theMask-RCNN architecture to predict per-pixel object labels, whichcomes at a higher performance cost. In contrast, our approach isimplemented directly on an optical see-through HMD (HoloLens)using a client-server architecture, and uses traditional bounding boxdetectors which can run in true real-time (30fps) with few droppedframes.Our work links objects that are recognized in real time in 2Dframes to positions in the modeled 3D scene, which is akin to projecting and disambiguating 2D hand-drawn annotations into 3Dscene space [18].2.2View Management for Object LabelingA body of work in AR research focuses on optimized label placement and appearance modulation. In a similar fashion that we use2D bounding boxes of recognized objects in the image plane to determine a 3D label position for that object, several view managementapproaches optimize the placement of annotations based on the 2Drectangular extent of 3D objects in the image plane [2, 3, 12]. Otherapproaches allow the adjustment of labels in 3D space [26, 30], afeature that might be gainfully employed in our system to subtly optimize the location of an initially placed label over time as multiplevantage points accumulate. However, this would pose the additionalproblem of disruptive label movement, due to loss of temporal coherence. Since potential mislabeling actions due to occlusions –the main motivation for 3D label adjustment – are automaticallyresolved by the HoloLens’ continuous scene modeling (occludersare automatically modeled as occluding phantom objects), we cansimply avoid label adjustment after we arrived at a good initialplacement. Label appearance optimization [9] and assurance oflegibility [10, 22] are beyond the scope of this paper.2.3Memory and Learning InterfacesThe idea of augmenting human memory or facilitating learningwith computers appeared almost simultaneously with the history ofmodern computing. For example, early work by Siklossy in 1968proposed the idea of natural language learning using a computer [31].Since then, much progress has been made, for example by turningthe learning process into a serious game [16]. Though not in anin-situ environment, Liu et al. proposed the use of 2D barcodes forsupporting English learning. Though relatively simple, this methodhelps motivate the use of AR for learning new concepts, as a formof fully contextualized learning [25].In addition to language learning, some work has been presentedthat seeks to augment or improve memory in general. For example,the infrastructure proposed by Chang et al. facilitated adaptivelearning using mobile phones in outdoor environments [5]. Similarly,Orlosky et al. proposed the use of a system that recorded the locationof objects, such as words in books, based on eye gaze, with thepurpose of improving access to forgotten items or words [24].Other studies like that of Dunleavy et al. found that learning in ARis engaging, but still faces a number of technical and cognitive challenges [8]. Kukulska-Hulme et al. further reviewed the affordancesof mobile learning, having similar ﬁndings that AR was engagingand fun for the purpose of education, but found that technologylimitations like tracking accuracy interfered with learning [17]. Onemore attempt at facilitating language learning by Santos et al. used amarker based approach on a tablet and tested vocabulary acquisitionwith marker-based AR. In contrast, our approach is designed to beautomatic, and is a hands-free in-situ approach.Most recently, Ibrahim et al. examined how well in-situ AR canfunction as a language learning tool [14]. They studied in-situ objectlabelling in comparison to a traditional ﬂash card learning approach,and found that those who used AR remembered more words after a4 day delayed post-test. However, this method was set up manuallyin terms of the object labels. In other words, the objects needed tobe labelled manually for use with the display in real time. In orderto use the display for learning in practice, these labels need to beplaced automatically, without manual interaction.This is the main problem our paper tackles. We have developedthe framework necessary to perform this recognition, and at the sametime we solve problems like object jitter due to improper boundingboxes. This sets the stage for a more effective implementation oflearning via the method of loci, and can even enable reinforcementtype schemes like spacing algorithms [11] that adapt to the pace ofthe user based on real world learning.3 AR L ANGUAGE L EARNING F RAMEWORKAs further motivation for this system, we envision a future whereAugmented Reality headsets are smaller and more ubiquitous, andare capable of being worn and used on a daily basis much like currentsmart phones and smart watches. In such an ”always-on AR” future,augmented reality has the potential to transform language learningby adapting educational material to the user’s own environment,which may improve learning and recall. Learning content may alsobe presented throughout the day, providing spontaneous learningmoments that are more memorable by taking advantage of uniqueexperiences or environmental conditions. Furthermore, an alwayson AR device allows us to take into consideration the cognitive stateof the user through emerging technologies for vitals sensing. Usingthis information, we can gain a better understanding of the user’sattention, and more readily adapt to their needs. To enable researchinto these interaction paradigms, we propose a practical frameworkthat can be implemented and deployed on current hardware usingcurrent sensing techniques. We believe the fundamental buildingblocks for AR language learning include three components: Environment sensing with object level semantics Attention-aware interaction Personalized learning modelsThese components provide the necessary set of capabilities required by the AR language learning applications we envision. In thenext section, we will introduce a system design which implementsthis framework using existing technologies. Then, we will describethe realization of the ﬁrst component of our framework, through an1607Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on March 09,2021 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.

object level semantic labeling system. Finally, we will discuss ourongoing work regarding the second and third components.4S YSTEM D ESIGNIn this section, we introduce a client-server architecture composedof several interconnected components, including the hardware usedfor AR and eye tracking, the object recognition system, the gazetracking system, and the language learning and reinforcement model.The overall design and information ﬂow between these pieces andparts is shown in Figure 2.The combination of these pieces and parts allow us to detect newobjects, robustly localize them in 3D despite jitter, shaking, andocclusion, and label the objects properly despite improper detection.Our current implementation targets English as a Second Language(ESL) students, thus our labels are presented in English. But the labelconcepts could be translated and adapted to many other languages.4.1HardwareWe chose the Microsoft Hololens for our display, primarily becauseit provides access to the 3D structure of the environment and canstream the 2D camera image to a server for object recognition. Howwe project, synchronize, and preserve the 2D recognition points ontotheir 3D positions in the world will be described later.The HoloLens is also equipped with a 3D printed mount thathouses two Pupil-Labs infrared (IR) eye tracking cameras, as shownin Fig. 1 c). These cameras are each equipped with two IR LEDs,and have adjustable arms that allow us to adjust the camera positionsfor individual users. The eye tracking framework employs a noveldrift correction algorithm that can account for shifts on the usersface.For the server side of our interface, we utilized a VR backpackwith an Intel Core i7-7820HK and Nvidia Geforce GTX 1070 graphics card. Since the backpack is designed for mobile use, this allowsboth the Hololens and Server to be mobile, as long as they are connected via network. To maximize throughput during testing andexperimentation, we connected both devices on the same subnet.4.2Summary of Data FlowOur system starts by initializing the Unity world to the same trackingspace as the Hololens. Next, we begin streaming images from theHololens forward-facing camera, which are sent to and from theserver-side backpack via custom encoding. Upon reaching the server,they are decoded and input into the object recognition module, whichreturns a of 2D bounding box with an object label. The center ofthis bounding box is then sent back to the Hololens and projectedinto 3D world space by raycasting against the mesh provided by theHololens. This projected point is treated as a ”candidate point”,which is fed into our object registration algorithm. The objectregistration algorithm looks over the set of candidate points overtime to decide where to assign a ﬁnal object label and position. Oncean object and its position have been correctly assigned, the object issynchronized with the Unity space on the server side. Finally, labelson the objects are activated using eye-gaze selection, giving the usera method for interaction. The results from this interaction are fedinto a personalized learning model, providing the ability to designcontent that adapts to the growth of the user.5I N -S ITU L ABELINGThe success of Convolutional Neural Networks (CNNs) has leadto technological breakthroughs in object recognition. However, itis not yet obvious how to integrate these technologies into AR.Three major parts need to be in place for these tools to be usedpractically. First, they need to be tested in practice (not just onindividual image data sets) and provide good enough recognition tolabel an object correctly over time. Secondly, we need to establishobject registration that is resilient to failed recognition frames, jitter,Figure 2: Diagram of our entire architecture, including hardware ingrey, algorithms and systems in blue, and data ﬂow in green. Theleft-hand block includes all processing done on the Hololens and theright-hand block includes all processing done on the VR backpack.radical changes to display orientation, and objects entering/leavingthe display’s ﬁeld of view (FoV). Finally, current AR devices are notpowerful enough to run state-of-the-art CNNs. We need to handlethe synchronization and reprojection between streamed frames fromthe AR device and recognition results from a server with a powerfulGPU.5.1Object Recognition ModuleThe ﬁrst step for the development of our system was ﬁnding a scalable object recognition approach that could be used with the forwardfacing camera on the HoloLens. Due to the real-time performanceconstraint, we had to test and reﬁne a variety of approaches beforeﬁnding one that worked. We ﬁnally found the Single Shot MultiBoxDetector (SSD) by Liu et al. to be effective [19]. Speciﬁcally, weuse the implementation provided by the TensorFlow Object Detection API, using the ssd mobilenet v1 coco model, which has beenpre-trained on MS COCO.We stream video frames from the built-in HoloLens front facingcamera to a server running on an MSI VR backpack. To keeppacket sizes small, we used the lowest available camera resolution of896x504. Each frame is encoded into JPEG at 50% quality, so thattheir ﬁnal size ﬁts into a single UDP packet. We also encode andsend the current camera pose along with each frame. On the serverside, we place all frames into an input queue. An asynchronousprocessing thread takes the most recent frame from the input queueand feeds it through the SSD network. The resulting 2D boundingboxes and labels are then sent back to the HoloLens, along with theoriginal camera pose. Back on the HoloLens, we project the centerpoint of each 2D bounding box onto the 3D mesh by performing araycast from the original camera pose.This particular implementation of SSD takes 30ms per prediction on the VR backpack, which just barely allows us to achieve30fps under ideal network conditions. There is a slight delay due tonetwork latency, as our network has a round trip time of 150ms.SSD and similar CNN based real-time object recognition architectures are known to perform poorly with small objects [13]. Inpractice, we found that small objects, such as spoons and forks,experience much higher false positive rates and predictions are not1608Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on March 09,2021 at 05:33:25 UTC from IEEE Xplore. Restrictions apply.

Table 1: Data for ground truth (GT) and Estimation error in cm ofthe Euclidean distance between user-selected center points of eachobject in cm and a known 3D point in the tracking space.ObjectGTUser 1User 2User 3Avg ure 3: Left: Raw points returned from object recognition as projected into 3D space, accumulated over several frames. This showsthe variance in predicted positions and false positive label predictions.Right: Scene correctly labeled with object-permanent labels.consistent acr

Augmented Reality headsets are smaller and more ubiquitous, and are capable of being worn and used on a daily basis much like current smart phones and smart watches. In such an ”always-on AR” future, augmented reality has the potential to transform language learning by adapting educational material to the user’s own environment,

Related Documents:

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

375 Views

1y ago

10 tips och tricks för att lyckas med ert sap-projekt

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

735 Views

2y ago

Nordens 25 största medieföretag efter omsättning

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

333 Views

1y ago

SS 02 52 68 Ljudklassning av utrymmen i byggnader - byggtjanst.se

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

357 Views

1y ago

Apple Developer Program License Agreement (Swedish)

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

343 Views

1y ago

In-situ experiments

In-situ observation of chemical reactions In-situ crystallization of amorphous metals Data analysis Outline 1 Introduction 2 In-situ observation of chemical reactions 3 In-situ crystallization of amorphous metals 4 Data analysis V

21 Views

2y ago

ICMLA 2010 Tutorial on Sequence Labeling: Generative and Discriminative ...

Sequence Labeling Outline 1 Sequence Labeling 2 Binary Classi ers 3 Multi-class classi cation 4 Hidden Markov Models 5 Generative vs Discriminative Models 6 Conditional random elds 7 Training CRFs 8 Structured SVM for sequence labeling Hakan Erdogan, A tutorial on sequence labeling, ICMLA 2010, Bethesda MD, December 2010

8 Views

11m ago

TEST INSTRUMENTS TEST MATERIALS SERVICE & CALIBRATION

- Boeing BSS 7230 F1 and F2 Test Method 5903 Federal Standard 191A. CPAI 84: Flammability of materials used in tents. Flame Resistance: Various NFPA Protective Clothing Standards CA TB 117 Sections AI, BI, & BII CFM Title 19 1237 Small scale Widely cited throughout the US And internationally to measure the ignition resistant properties of transportation materials, tents and protective PPE - .

79 Views

3y ago

Recent Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Smart People Ask for (My) Advice: Seeking Advice Boosts .

advice strategically is likely to be a different experi-ence for the advice seeker than seeking advice with the intention of using it, from the advisor’s perspec-tive, strategic advice seeking may elicit the same per-ceptual effects as authentic advice seeking because the advice seeker’s intentions (and her reliance on advice)

3y ago

177 Views

Legal Action Group The Role of Advice Services in Health Outcomes

The Role of Advice Services in Health Outcomes Evidence Review and Mapping Study June 2015 The Role of Advice Services in Health Outcomes . tor.!Our! r,!

1y ago

170 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

ProQual L2 Certificate Supporting Access to Legal Advice

R/502/7657 Communicating with legal advice clients 2 3 D/503/0822 Supporting clients to make use of the legal advice service 2 3 R/502/7660 Enabling legal advice clients to access signposting and referral opportunities 2 3 Optional Units - a minimum of 6 credits Unit Reference Number Unit Title Unit Level Credit Value

1y ago

173 Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

Priority Banking Tariff - Standard Chartered

Foreign exchange rate Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free SMS Banking Daily Weekly Monthly. in USD or in other foreign currencies in VND . IDD rates min. VND 85,000 Annual Rental Fee12 Locker size Small Locker size Medium Locker size Large Rental Deposit12,13 Lock replacement

2y ago

206 Views

legal and ethical dimensions of practice - Dovetail

Material in this Guide should never be taken as providing you or any other person with legal advice. Legal advice regarding the application of the law to a particular circumstance or situation can only come from a legal practitioner. A range of sources for legal advice can be found in the Guide.

1y ago

167 Views

How Social Welfare Legal Advice and Social Prescribing can work .

The position of social welfare legal advice and its role in London's recovery The Mayor of London and partners should position social welfare legal advice as a core pillar of Londons recovery from the OVID-19 pandemic, with a core focus on ensuring adequate funding and practical support for advice agencies to ensure ongoing viability.

1y ago

172 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Regulatory Guide RG 90 Example Statement of Advice: Scaled advice for a .

representatives and advisers who give personal advice to retail clients. It explains how and why we have developed an example Statement of Advice (SOA) for scaled advice (i.e. personal advice that is limited in scope) on personal insurance for a new retail client. The example SOA was developed in consultation with stakeholders, and we

1y ago

186 Views

Removal of licence disqualification - Legal Aid WA

agencies, permission must first be obtained from Legal Aid Western Australia. This Kit provides information about the law only and does not constitute legal advice. You should seek legal advice if you have a specific legal problem. Every effort is made to ensure that the information contai

2y ago

253 Views

Legal Information vs - txcourts.gov

giving legal advice. Legal advice is a written or oral statement that: Inter p rets some as ect of th elaw, courtles, or du s; Recomme nd s a pecific c ourse of ndu ters h ld k ein an actual or ntial legal proceeding; or 'sApplies th elaw to individu alperso n seci fic actu circums a . What is Legal Information?

1y ago

174 Views

In-Situ Labeling For Augmented Reality Language Learning

It looks like you're using an ad-blocker