Data Driven Approaches For Spoken Dialog Processing

3y ago
26 Views
2 Downloads
2.06 MB
81 Pages
Last View : 23d ago
Last Download : 3m ago
Upload by : Azalea Piercy
Transcription

Data Driven Approaches for SpokenDialog ProcessingGary Geunbae Lee, Ph.D., ProfessorDept. CSE, POSTECHEurope-Korea SDS workshop

Contents SLU DM On-going researches

Ubiquitous spoken dialog interface?Telematics Dialog Interface (POSTECH, LG, DiQuest)Car-navigationTele-serviceHome networkingRobot interface

What‟s hard – ambiguities, ambiguities, alldifferent levels of ambiguitiesJohn stopped at the donut store on his way home fromwork. He thought a coffee was good every few hours.But it turned out to be too expensive there. [from J.Eisner lecture note]- donut: To get a donut (doughnut; spare tire) for his car?- Donut store: store where donuts shop? or is run by donuts? or looks like a bigdonut? or made of donut?- From work: Well, actually, he stopped there from hunger and exhaustion, notjust from work.- Every few hours: That‟s how often he thought it? Or that‟s for coffee?- it: the particular coffee that was good every few hours? the donut store? thesituation- Too expensive: too expensive for what? what are we supposed to concludeabout what John did?

Spoken Dialog SystemUser SpeechSystem SpeechWhich date do you want to fly fromWashington to Denver?ASRResponseGenerationAutomatic SpeechRecognitionRGModels,RulesRecognized Sentence“I need a flight fromWashington DC toDenver roundtrip”System ActionGET DEPARTURE DATESLUDialogManagementDMSemantic MeaningSpoken LanguageUnderstandingORIGIN CITY: WASHINGTONDESTINATION CITY: DENVERFLIGHT TYPE: ROUNDTRIP

Spoken Language Understanding

SLU Spoken language understanding (SLU) is to map naturallanguage speech to frame structure encoding of itsmeanings. frame domain „ATIS‟ utt Show me flights from Denver to New Yorkon Nov. 18th /utt slot type „DA‟ name „Show Flight‟/ slot type „NE‟ name „FROM.CITY‟ Denver /slot slot type „NE‟ name „TO.CITY‟ New York /slot slot type „NE‟ name „MONTH‟ Nov. /slot slot type „NE‟ name „DAY NUMBER‟ 18th /slot /frame Automatic SpeechRecognitionxSequenceLabelingx,y frame domain „EPG‟ utt I want to watch LOST /utt slot type „DA‟ name „Search Program‟/ slot type „NE‟ name „PROGRAM‟ LOST /slot /frame Classification(Named Entity /Frame Slot)(Dialog Act / Intent)SequenceLabeling ModelClassificationModel(e.g. HMM, CRFs)(e.g. MaxEnt, SVM)x,y,zDialogManagement

Non-local ochicagoondec.to dec.Selected Trigger (return dec.)to dec.want dec.I dec. Long-distance dependency and Trigger Features Outline of Trigger Selection Algorithm

Non-local Features Learning & Precision-Recall Curveson Communicator DataInducer typeTime# featuresF1 435:12:111,3202,05088.3666.88Approx. 10:47:302:14:028004,22795.1772.23Approx. 20:11:101:34:374671,42195.8772.72Approx. 1 20:05:301:29:266084,22796.1973.01Approx.1: Only Non-localApprox. 2: ME-based

Joint SLUhkyz1,1yzT,1yz1,1xzytyt 1xtxt 1y T,1yz1,1zy 1,1 yzstartyzT,1zyzendy T,1yzstartsstartzzf2kzy t-1yzT,Jyz1,JyzT,Jyz1,Jzgky T,J Search Spacext-1 Factor Graph of Triangular-chain CRF

Joint ir-TravelDANE91.29 86.5492.7688.9692.85 86.3692.18 88.9493.77 88.24Robot-CaféDANE88.59 96.1189.4096.1387.74 96.0088.64 96.1389.47 96.12TelebankDANE94.53 88.5596.5888.8596.57 88.2994.65 88.8896.92 88.91TV-EPGDANE96.84 96.2996.9196.7696.86 96.3196.60 96.5196.99 96.94AverageDANE92.81 91.8793.9192.6893.51 91.7493.02 92.6294.29 92.55 A description of four dialog data setsData setAir-TravelRobot-CaféTelebankTV-EPG# of utt1,1786262,2391,917DA1282516NE5420177 Log-likelihood for DA, NE and Joint Optimization

Contents SLU DM On-going researches

Example-based Dialog Management

Introduction Pipeline Architecture for SDSSpeech InputInputRecognitionKeyboard InputSpeech RecognitionNatural Language ProcessingNaturalLanguageUnderstandingDialog Act RecognitionNamed Entity RecognitionDialogManagementDiscourse AnalysisDatabase QuerySystem Action PredictionNaturalLanguageGenerationInformation PresentationUtterance Realization Dialog Management– A central component to select correct system actions based onobserved evidences and inferred beliefs. Goal– Dialog modeling for practical deployment of multi-domain dialogsystem

Introduction Related Works– Knowledge-based approach Regularized Language & Grammar Hand-crafted Rules & Finite State Automaton Pros & Cons– This approach have been deployed in many practical application. ( )– Can be controlled by developers ( ) (-)– Not good for domain portability and flexibility. (-)– Data-driven approach A stochastic modeling using reinforcement learning [Levin et al.,2000]– Formalization : fully or partially observable Markov Decision Processes. Pros & Cons [Paek, 2006]––––The training is done automatically ( )Theoretically formalized ( )Time-consuming corpus collection & annotation (-)High Complexity & Hardly practical deployment (-)

Example-Based Dialog Modeling Concept– From Example-Based Machine Translation (EBMT) Nagao (1984) introduced the EBMT methodology.– Source Sentence Processing Finding Similar Sentence Retrieval Target Sentence– The idea of EBMT can be extended to determine the nextsystem actions by finding the similar dialog example in thecorpus.– Example-Based Dialog Modeling (EBDM) [Lee et al.,2006;Lee et al., 2007] A dialog model in which the system action can be selected by thesimilar user utterance within dialog corpus.– User Utterance Processing Finding Similar Utterance RetrievalSystem Action Dialog Example Database (DEDB)– Indexed by state variables chosen by system designer.

Example-Based Dialog Modeling Indexing and Querying– DEDB is semantically indexed and queried to generalize the data.Goal-Oriented Dialog Corpus (Domain Navigation)#1User: Where is the Korean restaurants?[Dialog Act wh-question][Main Goal search-location][LOC TYPE Korean restaurant]System: There are A, B, and C in D and E and F in G.[System Action inform(name,address)]DEDB (Dialog Example Database)User Utterance Where is the LOC TYPE?Domain navigationDialog Act wh-questionMain Goal search-locationLOC TYPE 1 (filled)LOC ADDRESS 0 (unfilled)LOC NAME 0ROUTE TYPE 0Previoius Dialog Act s Previous Main Goal s Discourse History Vector [1,0,0,0]System Action inform(name,address)#2User: Let me go the A in D.[Dialog Act request][Main Goal guide][LOC NAME A][LOC ADDRESS D]System: Ok. You selected the A in D.[System Action select(name,address)]System: Choose the route type of the fastest or the easiest path.[System Action specify(rotue type)]User Utterance Let me go the LOC NAME in LOC ADDRESS.Domain navigationDialog Act requestMain Goal guideLOC TYPE 0LOC NAME 1LOC ADDRESS 1ROUTE TYPE 0Previous Dialog Act wh-questionPrevious Main Goal search-locationDiscourse History Vector [1,1,1,0]System Action select(name,address); specify(route type)* Discourse History Vector [LOC TYPE, LOC ADDRESS, LOC NAME, ROUTE TYPE]

Example-Based Dialog Modeling Relaxation– Once there is no example, dialog experts have some relaxationstrategies according to the genre and domain of dialog. State can be approximated by relaxing particular state variables foravoiding data sparseness. Utterance Similarity– Select the best one among the retrieved dialog examples Considering the lexical and discourse history informationCurrent User UtteranceRetrieved ExamplesWhere is the LOC TYPE?Where is the LOC TYPE in LOC ADDRESS?Discourse History Vector : [1,0,0,0]Discourse History Vector : [1,1,0,0]Let me know where the LOC TYPE is?Discourse History Vector : [1,0,0,0]Lexico-Semantic Similarity (by edit distance)Discourse History Similarity (by cosine measure)

Example-Based Dialog ModelingDialogueCorpusUser’s emanticFrameAutomaticIndexingDiscourseHistoryQuery GenerationUtterance SimilarityDialogueExample DBRetrieval Lexico-semantic Similarity Discourse history SimilarityBest DialogueExampleTie-breakingDialogueExamples

Domain Spotter Feature ExtractionUser : When do the KBS dramas start?Linguistic AnalysisSemantic AnalysisWhen/WRB do/VBP the/DT KBS/NNP drama/NN start/VBLast Word start Last Marker ? Last Tag VBLast Verb start Last Noun drama First Noun KBSDialog Act WH-QUESTIONMain Goal SEARCHAgent SpotterTask AgentKeyword AnalysisBest Keyword drama Second Keyword startBest Class EPG Second Class NavigationDomain SpotterEPG domain20

Experimental Results Spotter Evaluation– Domain Spotter– Agent Spotter21

Experimental Results Dialog Modeling Evaluation– Success Turn Rate– Task Completion Rate22

Error Recovery SystemNoisy Input(from leDBError Detected NOSystemTemplateContaining some errors.Error Detected YESS {# of Examples, # of Contents, # of Slots}NLGA {HelpType, Content, lection

Error Detection of EBDM No Example– No dialog example is retrieved by both exact and partial matches areused. OOV, OOU, and unexpected input may be out-of-example. This is a potential error because out-of-example is hard tounderstand and manage.No Content– No information is retrieved using the slot values of the current dialogframe. In this case, the user does not know the slot values of interest orunexpected errors in SLU module occurs.No Slot– The understanding module cannot extract any slot value from the userutterance The utterance without slot information may be erroneous when NoExample is detected. Because, in goal-oriented dialogs, most utterances contain slotinformation to search contents.24

Error Recovery Strategy Our Goal– The system should give help messages to learn what and how to say.Error Recovery Strategy– UtterHelp A help message for possible utterance template– Triggered by No Example detection The system gives an example template of what the user could sayat this situation.– InfoHelp A help message for domain content database– Triggered by No Content detection The system recommends relevant contents which can be retrievedwith the current dialog frame– UsageHelp A longer help message for a system usage– Triggered by all (No Example/No Content/No Slot) detection This system provides guidelines of using the system25

Recovery ExampleUtterHelpUser: Please inform me a category of the restaurants that serves Korean food.ASR output: Please me a car of a restaurant that Korean foodSLU output: [REQUEST, GUIDE LOC, FOOD-TYPE Korean food][Error Detection: No Example]System: You can say "Please give me a category of the restaurants that serve [FOOD-TYPE]"to search restaurants of [FOOD-TYPE].InfoHelpUser: Please inform me a category of the restaurants that serves Korean food.ASR output: Please inform me a category of restaurant that Korean fishSLU output: [REQUEST, SEARCH LOC, FOOD-TYPE Korean fish][Error Detection: No Content]System: There is no content on "Korean fish". You can say "Korean food", "Koreanbeef", and "Japanese fish" as the slot of [FOOD-TYPE].26

Experimental Result User Evaluation [Lee et al., 2007]– Car navigation domain for 4 slot-filling dialog Provides support for the information and selection of the desired destinationin Pohang city.– About 160 places names related to the restaurants, hospital, etc.– 10 novice users with prior knowledge about the system 50 dialogs of evaluation set (5 dialogs for each user)– Measure TCR Task Completion Rate #AvgTurn # of user utters / # of dialogs #DetErr # of detected errors / # of dialogs27

Dialog for EPG (POSTECH)Unified Chatting and Goaloriented Dialog (POSTECH)

Contents SLU DM On-going researches

Context-sensitive Chatbot

Advanced ChatBot Discourse-Sensitive ChatBot– Most of chatbot can not reflect the discourse context of previous userturns. [ALICE, 2002; B. Shawar, 2002]Context-Free ChatBotUSER : Did you watch a Korean team’s gameyesterday?SYSTEM : What game?.USER : How come you didn’t know that?SYSTEM : Well, I don’t know.Obscure answeringWhat’s better?Context-Sensitive ChatBotUSER : Did you watch a Korean team’sgame yesterday?SYSTEM : What game?.USER : How come you didn’t know that?SYSTEM : I am not interested in sports.More intelligent answering

Discourse History in ChatBotContext-Free Index KeyUSER : I had a date last weekDialog Act statement-non-opinionMain Goal performUtterance 1Agent chatDomain frienddate last weekevent dateSYSTEM : Did you like her?DA statement-non-opinionMA performDOMAIN friendDate 1Theme 1Discourse-Sensitive Index KeyPrevious Semantics “ s , s ”Previous Keyword “date”DA statement-non-opinionMA performDOMAIN friendDate 1Theme 1P(DAt ,MAt DAt-1,MAt-1 ) Discourse CoherenceUSER : Well, I think so, but.Dialog Act statement-opinionMain Goal judgeUtterance 2Agent chatDomain friendSYSTEM : but, what?USER : I don’t know her real feeling.Dialog Act statement-opinionMain Goal knowAgent chatDomain friendUtterance 3theme real feelingSYSTEM : Why? What happen? P(statement opinion,judge statement non opinion,perform) DA statement-opinionMA judgeDOMAIN friendPrevious Semantics “statement-non-opinion,perform”Previous Keyword “date”Scenario Session “2”DA statement-opinionMA judgeDOMAIN friendAbstraction of previous user turnDA statement-opinionMA knowDOMAIN friendTheme 1Previous Semantics “statement-opinion,judge”Previous Keyword “NULL”Scenario Session “2”DA statement-opinionMA knowDOMAIN friend

POSTECH Chatbot Demo

Multimodal Dialog Management

Multi-Modal Dialog A system which supports human-computer interaction over multiple differentinput and/or output modes.– Input: voice, pen, gesture, face expression, etc.– Output: voice, graphical output, etc.Applications– GPS, Information guide system, Smart home control, etc.Task performance and user preference for multi-modal over speechinterfaces [Oviatt et al., 1997]– 10% faster task completion,What is a decent Japanese– 23% fewer words,restaurant near here?.– 35% fewer task errors,– 35% fewer spoken disfluenciesHard to represent using only uni-modal !!35

Multi-Modal Dialog System Architecture Components of multi-modal dialog system [Chaiet al., 2002]Uni-modalUnderstandingMulti-modal Understanding& reference analysisspeechASRgestureGesture Recog/UnderstandingSLUface expressionspeechgraphicsMuiltimodalIntegration (Fusion)Uni-modalinterpretation retation frameMuiltimodalGeneration (Fission)GraphicalRendering36DialogManagement

N-best re-ranking for improvingspeech recognition performance Using multi-modal understanding feature [Kim etal., 2007]errorSpeechASRbring this toherebring his to hereSLUPenSpeech Act: requestMain Goal: moveComponent Slots:Target.Loc : hereMissing the slot!!! Source.item: this37

Experimental Result Experimental Result [Kim et al., 2007]– Word error rate Relative error reduction rate: 7.95 (%) Re-ranking model has significantly smaller word error ratesthan that of baseline system. (p 0.001)– Concept error rate Relative error reduction rate: 10.13 (%) Re-ranking model has significantly smaller concept error rates than that of baseline system. (p 0.01)WER (%)CER (%)baseline17.7414.28 Speech recognizer features17.3813.81 SLU features16.4313.11 Multi-Modal reference resolutionfeatures16.3312.8338

Multi-Modal Dialog Management UsingHidden Information State Manager Hidden Information State Dialog Manager forMulti-Modal Dialog System [Kim et al., 2007]– POMDP based dialog manager Uncertainty inherent framework Maintains probability distribution of dialog states– Scaling POMDPs for dialog system The state space of a practical dialog system is very large.– E.g.) 3-city tourist domain» n( b(su, au, sd) ) 6 user goals * 18 user acts * 18dialog states 1944 states Hidden information state dialog manager – Groups the equivalent states to a partition– Generates system action hypotheses according to eachpartition39

POSTECH multimodal DialogSystem Demo

Dialog Studio

Introduction to Dialog Studio Motivation Dialog system development and maintenanceinvolves System Tutoring Model adaptation Model Synchronization ASR SLU DM Human Effort & Time reduction Dialog Simulation For auto dialog evaluation For automatic massive corpus building For finding flaws of the dialog system

ExternalComponentDialog wledgeStructureConfiguration orageFileDialogUtterance PoolKnowledgeImporterCorpusSynchronization derTraining elASRSLUDialogManagerDialogSimulatorLog StorageEvaluatorRunning Step

Experiments – EPG Domain05:02.4Not Using .601:26.400:43.200:00.0SLU corpusDialogPreparing NewAnnotationExampleModelsSLU TuningDM x.3

Dialog Simulation for SDSSimulatedReal System

Where can we use dialog simulation? Strategy Learning– State Exploration Reinforcement Learning POMDP Evaluation– Strategy Evaluation– DM Performance Evaluation Corpus Expansion– User Side corpus expansion SLU corpus– User System Side corpus expansion Dialog corpus Small corpus Large corpus

Dialog Simulator ArchitectureUO -SurfaceUR - SurfaceSurfaceSimulatorNoisy AddedUO - ialog ManagerUser SimulatorS - Intention

Intention/Utterance SimulationDiscourse Info.UserIntentionUser Semantic FrameDialog ActRequestMain GoalMove ChannelComponent.[Genre]--Semantic FrameGenerated User Utterance그냥 [genre] 프로 좀 보자User UtteranceSimulationUser Utterances야 [genre] 좀 틀어[genre] 좀 틀어그냥 [genre] 프로그램 보자그럼 그냥 [genre] 나 보자에이 [genre] 프로 좀 보자

Linear CRF Model for IntentionModelingUIUIUIUIDIDIDIDITurnTurnTurnTurn Assumption– An user utterance has only one intention UI : User Intention DI : Discourse Information– Previous System Response Discourse History

Utterance generationDialog Act X Main GoalStructure Tag ngW4W5 Structure Tags : Component Slot Names Part of Speech Tags S : member of Structure Tags given space W : member of vocabulary given space

Naturalness BLEU (N-gram Precision)

Intention Simulation Result(Korean English translation )DIALOG ID 1Score : 0.88914USER : What‟s the phone number of „Gajokjung‟?SYS : The phone number of „Gajokjung‟ placed in „Daeidong‟ is 2757543USER : OK. Let‟s go to „Daeidong‟SYS : You choosed „Gajokjung‟ in „Daeidong‟. Please select the way ofpath finding. You can select „Shortest path‟ and „Optimal path‟.USER : Shortest path.SYS : You selected „Shortest path‟. May I guide you to „Gajokjung‟ in„Daeidong‟ in shortest path way?USER : Ok. Thank you.

POSTECH Dialog Studio Demo

English Tutoring Dialog Management

Dialogue System for EnglishConversationTutoringMotivation– Huge demand for education of English conversation, but“One-to-one human tutoring is very expensive”– Spoken dialogue system can be used instead at low cost Research Issues––––Speech recognition for the non-nativeLanguage understanding for the non-nativeFlexible dialouge modeling for diverse conversation themesFeedback generation about performance of studentsIUI 2007 tutorial55

Research Issues Language understanding for the non-native– Improve robustness by adopting statistical methods– Use features designed fo

Example-Based Dialog Modeling Relaxation – Once there is no example, dialog experts have some relaxation strategies according to the genre and domain of dialog. State can be approximated by relaxing particular state variables for avoiding data sparseness. Utterance Similarity – Select the best one among the retrieved dialog examples

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

och krav. Maskinerna skriver ut upp till fyra tum breda etiketter med direkt termoteknik och termotransferteknik och är lämpliga för en lång rad användningsområden på vertikala marknader. TD-seriens professionella etikettskrivare för . skrivbordet. Brothers nya avancerade 4-tums etikettskrivare för skrivbordet är effektiva och enkla att

Den kanadensiska språkvetaren Jim Cummins har visat i sin forskning från år 1979 att det kan ta 1 till 3 år för att lära sig ett vardagsspråk och mellan 5 till 7 år för att behärska ett akademiskt språk.4 Han införde två begrepp för att beskriva elevernas språkliga kompetens: BI

**Godkänd av MAN för upp till 120 000 km och Mercedes Benz, Volvo och Renault för upp till 100 000 km i enlighet med deras specifikationer. Faktiskt oljebyte beror på motortyp, körförhållanden, servicehistorik, OBD och bränslekvalitet. Se alltid tillverkarens instruktionsbok. Art.Nr. 159CAC Art.Nr. 159CAA Art.Nr. 159CAB Art.Nr. 217B1B