|
|
E-mail this message to a friend
|
|
Title:
|
Bootstrapping Reinforcement Learning-Based Dialogue Strategies from Wizard-of-Oz Data
|
|
Author:
|
Verena Rieser
|
|
Email:
|
click here to access email
|
|
Homepage:
|
http://homepages.inf.ed.ac.uk/vrieser/
|
|
Degree Awarded:
|
Saarland University
, Department of Computational Linguistics and Phonetics
|
|
Degree Date:
|
2008
|
|
Linguistic Subfield(s):
|
Computational Linguistics
|
|
Director(s):
|
Oliver Lemon
Manfred Pinkal
|
|
|
Abstract:
|
|
In my PhD thesis, I develop a framework to optimise multimodal dialogue
strategies from small amounts of Wizard-of-Oz (WOZ) data.
Designing a spoken dialogue system can be a time-consuming and challenging
process. To facilitate strategy development, recent research investigates
the use of Reinforcement Learning (RL) methods applied to automatic
dialogue strategy optimisation from real data. For new application domains
where a system is designed from scratch, however, there is often no
suitable in-domain data available, leaving the developer with a classic
chicken-and-egg problem.
This thesis proposes to learn dialogue strategies by simulation-based RL,
where the simulated environment is learned from small amounts of
Wizard-of-Oz data. Using WOZ data rather than data from real Human-Computer
Interaction allows us to learn optimal strategies for new application areas
beyond the scope of existing dialogue systems. Optimised learned strategies
are then available from the first moment of online-operation, and tedious
handcrafting of dialogue strategies is fully omitted. We call this method
'bootstrapping'.
Our results show that a dialogue policy constructed using this framework
significantly outperforms a non-optimised data-driven policy (constructed
via Supervised Learning) in in terms of subjective user ratings and
objective dialogue performance measures. For example, RL leads to an almost
50% increase in perceived Task Ease and almost 20% increase in Future Use.
The technical contributions of this thesis are new methods and techniques
introduced to learn a simulated learning environment from small amounts of
WOZ data. For example, a new method to learn and evaluate user simulations,
and non-linear reward functions are introduced. The overall contribution is
an end-to-end data-driven framework to design and evaluate RL-based
dialogue strategies - from data collection to user testing.
|
|
|
|
|
Page Updated: 25-Nov-2009

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|