LINGUIST List 19.2459
|
Fri Aug 08 2008
Diss: Comp Ling: Rieser: 'Bootstrapping Reinforcement ...'
Editor for this issue: Evelyn Richter
<evelyn linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Verena
Rieser,
Bootstrapping Reinforcement Learning-Based Dialogue Strategies from Wizard-of-Oz Data
Message 1: Bootstrapping Reinforcement Learning-Based Dialogue Strategies from Wizard-of-Oz Data
|
Date: 08-Aug-2008
From: Verena Rieser <vrieser inf.ed.ac.uk>
Subject: Bootstrapping Reinforcement Learning-Based Dialogue Strategies from Wizard-of-Oz Data
E-mail this message to a friend
Institution: Saarland University
Program: Department of Computational Linguistics and Phonetics
Dissertation Status: Completed
Degree Date: 2008
Author: Verena Rieser
Dissertation Title: Bootstrapping Reinforcement Learning-Based Dialogue Strategies from Wizard-of-Oz Data
Dissertation URL: http://homepages.inf.ed.ac.uk/vrieser/thesis.html
Linguistic Field(s):
Computational Linguistics
Dissertation Director:
Oliver Lemon
Manfred Pinkal
Dissertation Abstract:
In my PhD thesis, I develop a framework to optimise multimodal dialogue strategies from small amounts of Wizard-of-Oz (WOZ) data. Designing a spoken dialogue system can be a time-consuming and challenging process. To facilitate strategy development, recent research investigates the use of Reinforcement Learning (RL) methods applied to automatic dialogue strategy optimisation from real data. For new application domains where a system is designed from scratch, however, there is often no suitable in-domain data available, leaving the developer with a classic chicken-and-egg problem. This thesis proposes to learn dialogue strategies by simulation-based RL, where the simulated environment is learned from small amounts of Wizard-of-Oz data. Using WOZ data rather than data from real Human-Computer Interaction allows us to learn optimal strategies for new application areas beyond the scope of existing dialogue systems. Optimised learned strategies are then available from the first moment of online-operation, and tedious handcrafting of dialogue strategies is fully omitted. We call this method 'bootstrapping'. Our results show that a dialogue policy constructed using this framework significantly outperforms a non-optimised data-driven policy (constructed via Supervised Learning) in in terms of subjective user ratings and objective dialogue performance measures. For example, RL leads to an almost 50% increase in perceived Task Ease and almost 20% increase in Future Use. The technical contributions of this thesis are new methods and techniques introduced to learn a simulated learning environment from small amounts of WOZ data. For example, a new method to learn and evaluate user simulations, and non-linear reward functions are introduced. The overall contribution is an end-to-end data-driven framework to design and evaluate RL-based dialogue strategies - from data collection to user testing.
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|