Intention-based Corrective Feedback Generation . - POSTECH

3y ago
14 Views
2 Downloads
1.01 MB
8 Pages
Last View : 18d ago
Last Download : 3m ago
Upload by : Brenna Zink
Transcription

INTENTION-BASED CORRECTIVE FEEDBACK GENERATIONUSING CONTEXT-AWARE MODELSungjin Lee, Cheongjae Lee, Jonghoon Lee, Hyungjong Noh, and Gary Geunbae LeePohang University of Science and Technology (POSTECH), Korea{junion, lcj80, jh21983, nohhj, gblee} @postech.ac.krKeywords:Dialog-based Computer Assisted Language Learning, Dialog System, Conversational TutoringAbstract:In order to facilitate language acquisition, when language learners speak incomprehensible utterances, aDialog-based Computer Assisted Language Learning (DB-CALL) system should provide matching fluentutterances by inferring the actual learner’s intention both from the utterance itself and from the dialogcontext as human tutors do. We propose a hybrid inference model that allows a practical and principled wayof separating the utterance model and the dialog context model so that only the utterance model needs to beadjusted for each fluency level. Also, we propose a feedback generation method that provides native-likeutterances by searching Example Expression Database using the inferred intention. In experiments, ourhybrid model outperformed the utterance only model. Also, from the increased dialog completion rate, wecan conclude that our method is suitable to produce appropriate feedback even when the learner's utterancesare highly incomprehensible. This is because the dialog context model effectively confines candidateintentions within the given context.1INTRODUCTIONSecond language acquisition (SLA) researchershave claimed that feedback provided duringconversational interaction facilitates the acquisitionprocess (Long, 2005; Swain, 1996). Helpfulinteractional processes include the negotiation ofmeaning and provision of recasts, both of which cansupply corrective feedback to let learners know thattheir utterances were problematic. A furtherinteractional process that can result from feedback isknown as modified output. For example, consider theinteractional processes, in which the systemnegotiates to determine the meaning using aclarification request in response to the learner’sunnatural expression (Table 1). The language learnermodified the original utterance to convey theintended meaning by referring to the recast providedby the system.Unfortunately, conversational interaction is one ofthe most expensive ways to teach a language. Thus,interest in developing Dialog-based ComputerAssisted Language Learning (DB-CALL) systems israpidly increasing. However, just using conventionaldialog systems in a foreign language would not bebeneficial because language learners commitnumerous and diverse errors. A DB-CALL systemshould be able to understand language learners'utterances in spite of these obstacles. Also, it isdesirable to offer appropriate feedback.To achieve this goal, rule-based systems usuallyanticipate error types and hand-craft a large numberof error rules but this approach makes these methodssensitive to unexpected errors and diverse errorcombinations (Schneider and McCoy, 1998; Mortonand Jack, 2005; Raux and Eskenazi, 2004). A moreserious problem is that just correcting grammaticalerrors cannot guarantee that the utterance is fluentand meaningful. Therefore, we argue that the properTable 1: An example dialog in which the DB-CALLsystem returns a feedback recommending use of a nativelike utteranceSpeaker t is the purpose of your trip?informMy purpose business(trip-purpose)Sorry, I don’t understand. What didyou say?clarify Clarification requestSystem:(underOn screen: try this expressionstanding)“I am here on business” RecastinformI am here on businessUser:(trip-purpose) Modified outputUser:

Figure 1: System architecturelanguage tutoring methodology is not to correctspecific errors but to provide native-like utteranceexamples which realize the user's intention.To accomplish this purpose, as human tutors do,we first infer the actual learners' intention from theerroneousutterances by taking not only theutterance itself but also the dialog context intoconsideration, and then generate a correctivefeedback based on the inferred intention.The remainder of this paper is structured asfollows. Section 2 briefly describes related studies.Section 3 introduces the system architecture andoperation. Section 4 presents the detailed descriptionof our hybrid intention recognition model. Section 5describes the experimental results to assess themethod’s potential usefulness. Finally, Section 6gives our conclusion.2RELATED WORKSThere are several studies on general dialogsystems which have examined incorporating thedialog context into recognizing dialog acts. Due tothe difficulties of extracting and employing richdialog context, most of them included just a fewtypes of context such as previous dialog act (Poesioand Mikheev, 1998), or dialog state in finite-statemodel (Bohus and Rudnicky, 2003). Recently, Ai et.al. (2007) investigated the effect of using rich dialogcontext and showed promising results. The ways toincorporate the dialog context mostly involved justcombining all features both from the utterance andthe context into one feature set which was then usedto train inference models. For DB-CALL, however,such approaches can be problematic, becausedistinct handling for each of the fluency level isimportant in a language learning setting. Given adialog scenario, the dialog context model isrelatively invariant; thus we propose a hybrid modelthat combines the utterance model and the dialogcontext model in a factored form. This approachallows us to adjust the hybrid model to a requiredfluency level by replacing only the utterance model.3SYSTEM ARCHITECUTREAND OPERATIONThe whole system consists of the intentionrecognizer and the dialog manager (Fig. 1). Theintention recognizer is a hybrid model of the dialogstate model and one of the utterance models. Aspecific utterance model is chosen according to alearner's proficiency level. When the learner utters,the utterance model elicits n-best hypotheses of thelearner's intention, and then they are re-ranked bythe results of the dialog state model. The detailedalgorithm will be described at the next section.The role of the dialog manager is to generatesystem responses according to the learner's intentionand also generate corrective feedback if needed.Corrective feedback generation takes two steps: 1)Example Search: the dialog manager retrievesexample expressions by querying ExampleExpression Database (EED) using the learner'sintention as the search key. 2) Example Selection:The dialog manager selects the best example whichmaximizes the similarity to the learner's utterancebased on lexico-semantic pattern matching.If the example expression is not equal to thelearner's utterance, the dialog manager shows theexample as recast feedback and conduct aclarification request to induce learners to modifytheir utterance (Table 1). Otherwise, the dialogmanager shows one of the retrieved examples asparaphrase feedback so that learners may acquireanother expression with the same meaning.Sometimes, students have no idea about what to sayand they cannot continue the dialog. In such a case,time out occurs and the utterance model does not

Table 2: Representation of dialog context and an examplefor immigration domainP(D) and P(D, U) can be ignored, because they areconstant for all I (Eq. 5):Dialog Context FeaturesPREV SYS INTPREV USR INTSYS INTINFO EX STATDB RES NUMPrevious system intentionEx) PREV SYS INT wh-question(job)Previous user intentionEx) PREV USR INT inform(job)Current system intentionEx) SYS INT confirm(job)A list of exchanged information stateswhich is essential to successful taskcompletion; (c) denotes confirmed, (u)unconfirmedEx) INFO EX STAT [nationality(c), job(u)]Number of database query resultsEx) DB RES NUM 0generate hypotheses. Hence, the dialog systemsearches EED with only the result of the dialog statemodel and shows the retrieved expression assuggestion feedback so that students can use it tocontinue a conversation4HYBRID INTENTIONRECOGNITION MODELOur representation of user intention consists ofdialog act and type of subtask as shown in Table 1.For example, the first system utterance “What is thepurpose of your trip?” can be abstracted by theintention wh-question (trip-purpose).The hybrid model merges hypotheses from theutterance model with hypotheses from the dialogcontext model to find the best overall matching userintention. In the language production process, userintentions are first derived from the dialog context;subsequently the user intentions determineutterances (Carroll, 2003). By using this dependencyand the chain rule, the most likely expected userintention ( , ) given the dialog context and theutterance can be stated as follows:( , ) ( , )( , , )( , )( ) ( ) ( )( , )( ) ( ) ( )( , ) ()(5)In this formula, P(I D) represents the dialogcontext model and P(I U) represents the utterancemodel. The next two subsections discuss each submodel in detail.4.1Utterance ModelTo predict the user intention from the utteranceitself, we use maximum entropy model (Ratnaparkhi,1998) trained on linguistically-motivated features.This model offers a clean way to combine diversepieces of linguistic information. We use thefollowing linguistic features for the utterance model. Lexical word features: Lexical word featuresconsist of lexical tri-grams using current,previous, and next lexical words. They areimportant features, but the lexical wordsappearing in training data are limited, so datasparseness problem can arise. POS tag features: POS tag features also includePOS tag tri-grams matching the lexical features.POS tag features provide generalization powerover the lexical features.The objective of this modeling is to find the I thatmaximizes the conditional probability, P(I U) in Eq.(5), which is estimated using Eq. (6):( ) 1(, )(6)where is the number of features,denotesthe features, the weighted parameters for features,and is a normalization factor to ensure ( ) 1. We use a limited memory version of the quasiNewton method (L-BFGS) to optimize the objectivefunction.(1)4.2Dialog Context Model(2)(3)By using Bayes’ rule, Eq. (3) can be reformulated as:( , ) ( ) ( )()( , ) (4)Our representation of a dialog context consists ofdiverse pieces of discourse and subtask informationas shown in Table 2. The task of predicting theprobable user intention in a given dialog context canbe viewed as searching for dialog contexts that aresimilar to the current one in dialog context space andthen inferring to the expected user intention from theuser intentions of the dialog contexts found.

Therefore, we can formulate the task as the k-nearestneighbors (KNN) problem (Dasarathy, 1990). Wehad a number of reasons for choosing instance-basedlearning methodology. First, instance-based learningprovides high controllability for tuning the modelincrementally during operation, which is practicallyvery desirable property. Second, an elaboratesimilarity function can be applied. Many of otherapproaches, e.g. maximum entropy model used inthe utterance model, express the similarity betweenstates in a simplest manner through the features thatthe states share, losing elaborate regularities betweenfeatures. For the dialog context model, we can easilypredict which features become important features tomeasure similarity conditioning on certain values ofother features using general discourse knowledge.For example, if the current system dialog act is“inform”, the number of database query resultsbecomes an important feature. If the number ofresults is greater than one, the most likely expecteduser intention would be “select”. If the number ofresults equals one, “ack” would be the most probableintention. Otherwise, the users might want to modifytheir requirements. Another example, if allexchanges of information are confirmed and thecurrent system intention is “wh-question”, thecurrent system intention itself becomes the mostimportant feature to determine the next userintention.However, the conventional KNN model has twodrawbacks. First, it considers no longer the degree ofsimilarity after selecting k nearest contexts, henceintentions that occur rarely cannot have a chance tobe chosen regardless of how close they are to thegiven dialog context. The second drawback is that ifdialog contexts with, say, intention A, are locallycondensed rather than widely distributed, then A isspecifically fitted intention to the local region of thedialog context. So the intention A should be givengreater preference than other intentions. To copewith these drawbacks, we introduce a new concept,locality, and take both similarity and locality intoaccount in estimating the probability distribution ofthe dialog context model (Eq. 9, 10).The similarity function is defined as the followingequation:( , ′) ( ,)(7)where is the number of features,denotes thefeature functions,the weighted parameters forfeatures. Our feature functions first include thesimplest tests, whether a feature is shared or not, foreach feature of a dialog context (Table 2). Forcomposite features, individual tests are also includedfor each constituent to alleviate data sparsenessproblems. For example, we include feature functionsnot only for system intention but also for itsconstituents, system dialog act and type of subtask.In addition, we include a number of featurefunctions which test the elaborate rules as illustratedin the examples above. The weighted parameters aregiven initial values based on general discourse andtask knowledge and optimized on the developmentdata set with minimum error rate criteria.The locality function is the ratio between thenumber of elements of the set , and the number ofelements of the set :,( , ) ( )(8),( ) }where { ,( ) },and , { andis the set of k nearest neighbors of the givendialog context .The score function calculates the score of theintention I based on the set of k nearest dialogcontexts using both similarity and locality:( , ) (, )( , )(9) ,To let the dialog context model be a probabilitydistribution, the score function is divided by thenormalization factor:( ) 5( , )( , )(10)EXPERIMENTSTo verify the proposed model, we conducted acase study for dialogs in a system devoted toimmigration into an English-speaking country. Weused the example based dialog management method(Lee et. al., 2009) to build the dialog system. Weused the dialog corpus in an immigration domainwhich consists of 192 dialogs (3,517 utterances,18.32 utterances per dialog). The number of types ofdialog act is seven and the number of types ofsubtask is eleven. We annotated each utterance in

dialogs by two steps. In the first step, we manuallyannotated each utterance with the speaker’s intentionand component slot-values. In the second step, weautomatically annotated each utterance with thediscourse information (Table 2) using in-housesoftware.5.1An example dialogThe following dialog gives a flavor of the systemin real action. A brief commentary is given at keypoints in the dialog and is printed in italics.S1: Hi. This is a dialogue system for Englishlearning. May I ask your name for authentication?U1: My name is Hong Kil DongS2: You are a new user. We have four themes suchas immigration, transportation, hotel, and restauranttheme. Which theme do you want to learn?U2: I’d like the immigrant themeS3: The immigrant theme is selected. Let’s getstartedS4: HelloU3: HiS5: Your passport pleaseU4: Here thisOn screen: [Recast] Here you are[Recast] is an example utterance provided byrecast feedback. “Here this”, the student’sutterance, is understandable based on context butit is not a good expression. The systemrecommends a better expression so that thestudent becomes better in a next dialogue.S6: Sorry, I don’t understand. What did you say?U5: Here you areS7: What's your nationality?U6: I am a KoreanOn screen: [Paraphrase] I come from KoreaWhen the user utterance is correct the systemshows another possible expression by paraphrasefeedback.S8: What's the purpose of your visit?U7: BusinessOn screen: [Paraphrase] I am here on businessIf a user input is simple words or a phrase, thenthe system shows a complete sentence.S9: How long are you staying?U8: I stay for three weeksOn screen: [Recast] I will stay for three weeksTheuserutteranceissemanticallyunderstandable but it is grammatically wrong.The system shows a grammatically correctsentence on screen which is obtained bysearchingEED.S10: Sorry, I don’t understand. What did you say?U9: I will stay for three weeksS11: Anything to declare?U10: Pardon me?S12: Anything to declare?U11: NothingOn screen: [Paraphrase] Just my personal thingsS13: What kind of job?U12: [TIMEOUT]On screen: [Suggestion] I am a student[Suggestion] is an example utterance provided bysuggestion feedback. The system knows that itasked the user about a job in the previousquestion. And it also knows that it already gotinformation about nationality, purpose of visitand period through the dialog history. Thereforeby using such dialogue history, the system canfind out a proper example from ExampleExpression DB.U13: I am a computer programmerS14: Ok, no problemU14: Thank you. ByeS15: The immigrant theme has been done. Yourtotal score is B. In next time, you better use morecomprehensive and complete expressions, and try tosay more quickly. Would you like to continuelearning another theme?The system provides assessment feedback whichincludes total score and specific messages thatinform the student which part they need toimprove.U15: No.S16: Study session ends.Figure 2: An example of real conversation in animmigration domain5.2Simulated language learnerIn this experiment, instead of involving reallanguage learners, we simulated them by injectinggrammar errors into clear utterances generated usingthe user simulation method described in (Jung et. al.,2009). We did this because we want to freely adjustthe fluency level of users for intensive tests in avariety of settings, and because the cost of engaginglarge numbers of language learners with diversefluency levels during development is particularlyhigh. Employing a simulated user will greatlyaccelerate the development cycles.To simulate various fluency levels, we examinedthe Korean Student English Essay (KSEE) corpuswhich is a collection of essays by Korean universitystudents. The corpus consists of 119 essays (915sentences). We annotated the corpus with the errortags defined in (Michaud, 2002). The frequencies oferror types were measured. In total, 65 error typesand 2,404 instantiated errors were discovered. Weclassified error types into three categories:

Table 3: Three categories of error types and the top 5 errortypes in each rtion(17%)Error type with categorySpell (71%)Plural Form (14%)Subject Verb Agreement (10%)Incorrect Preposition (3%)Incorrect Determiner (2%)Missing Determiner (62%)Missing Preposition (18%)Missing Conjunction (13%)Missing Verb (4%)Missing Subject (3%)Extra Preposition (36%)Extra Determiner (26%)Extra Conjunction (20%)Extra Verb (15%)Extra Intensifier (3%)substitution, insertion, and deletion. For eachcategory, we listed the five most common errortypes (Table 3) which account for 73% of the errors.As Foster (2007; 2005) and Lee (2009) generated atreebank of ungrammatical English, we alsoproduced artificial grammar errors systemically. Theerror generation procedure takes as input a part-ofspeech tagged sentence which is assumed to be wellformed, and outputs a part-of-speech taggedungrammatical sentence. In the first step of the errorgeneration procedure, we set the Grammar ErrorRate (GER) between 0 % 100 % and determinederror counts to be produced based on the GER. Then,we distributed the errors among categories and errortypes according to the percentages in the error typeslist (Table 3).5.3Figure 3: Comparison between the hybrid model and theutterance only modelOn the contrary to the task oriented dial

dialog context, most of them included just a few types of context such as previous dialog act (Poesio and Mikheev, 1998), or dialog state in finite-state model (Bohus and Rudnicky, 2003). Recently, Ai et. al. (2007) investigated the effect of using rich dialog context and showed promising results. The ways to

Related Documents:

corrective action for heating oil systems described in this document is modeled after the requirements in the Guide for Risk-Based Corrective Action at Petroleum Sites [ASTM E1739-05 (2002)], and is consistent with risk-based corrective action approaches included in many different corrective action programs implemented across the US.

Corrective Action Plan for John Adams High School House Bill 525 directs MS to develop a school improvement plan for schools identified as in need of “corrective action.” Investment School Corrective Action Plans were developed based on research about what makes urban schools successful.

Custodial Correctional Officer (CCO) is suitable for you. About Queensland Corrective Services (QCS) Queensland Corrective Services is a top-tier public safety agency that enhances the safety of Queenslanders through modern, sustainable and evidence-based corrective s

5. Evaluation 6. Evaluation of previous cycle submitted corrective actions C. Eligibility 1. Data analysis 2. Program analysis 3. Corrective action planning 4. Implementation and Monitoring 5. Evaluation 6. Evaluation of previous cycle submitted corrective actions PERM Medicaid/CHIP CAP Regulatory Requirements Cont 8

RCRA Corrective Action may be implemented through a Corrective Action Order or Voluntary Agreement. If the findings of the RCRA Facility Assessment indicate the need for further investigation or corrective action, the facility will be required to perform a RCRA Facility Investigation (RFI). The RFI may propose that no further action is necessary.

Corrective Action, OP-102-01 Rev NC 5. - Schedule/Timeliness 5.1 - QNP Management will determine an appropriate time frame for the activities outlined in a CAR, as well as a time frame for any follow up activities. 5.2 - QNP Management is responsible for periodically reviewing open corrective actions to ensure that they are being investigated, acted upon, followed up for effectiveness and .

Corrective Action Process 1.0 PURPOSE [1] This procedure provides instructions for the administration of Entergy (EN) Corrective Action process, including the identification, reporting, evaluation, and correction of a broad range of problems and areas for improvements. Issues addressed in the corrective action

Corrective action Preventive action The laboratory should identify and implement the corrective action(s) most likely to: Correct the problem Prevent recurrence The suitability of a corrective action cannot be determined without an identified cause Definition of Cause Analysis Determination of a potential problem's underlying cause or causes