LINGUIST List 10.440

Wed Mar 24 1999

Calls: Discourse Tagging, NLP Methods

Editor for this issue: Karen Milligan <karenlinguistlist.org>


As a matter of policy, LINGUIST discourages the use of abbreviations or acronyms in conference announcements unless they are explained in the text.

Directory

  1. Morena Danieli, ACL-99 Workshop on discourse tagging
  2. Liu Xiaohu, Methods in Natural Language Processing

Message 1: ACL-99 Workshop on discourse tagging

Date: Wed, 24 Mar 1999 16:07:17 +0100
From: Morena Danieli <danielicselt.it>
Subject: ACL-99 Workshop on discourse tagging



 TOWARDS STANDARDS AND TOOLS FOR DISCOURSE TAGGING
 (ACL-99 Workshop )
 June 22, 1999
 University of Maryland
 College Park, MD, USA

 URL: http://www.mri.mq.edu.au/conf/acl99/

DESCRIPTION

Discourse tagging assigns labels from a tag set to discourse units in
texts or dialogues. The discourse units range from words and phrases,
such as referring expressions, to multi-utterance units identified by
criteria such as speaker intention or initiative. Just as the
availability of syntactically annotated corpora has resulted in major
advances in sentence-level natural language processing, we expect that
corpora tagged for discourse features will lead to similar advances in
discourse processing.

Work on discourse tagging has gained momentum in the last 3-4 years.
Three major initiatives in this area are: the Discourse Resource
Initiative (http://www.georgetown.edu/luperfoy/Discourse-Treebank/),
that has organized yearly international workshops addressing the
standardization of discourse tagging schemes for coreference, for
dialogue acts, and for higher level discourse structures; MATE
(http://mate.mip.ou.dk/), a project co-funded by the European Union,
whose aim is to develop tools and standards for tagging spoken
dialogue corpora at different levels, including the discourse level;
the Global Document Annotation initiative
(http://www.etl.go.jp/etl/nl/GDA),
that aims at having Internet authors annotate their documents with a
common standard tag set which allows machines to recognize the
semantic and pragmatic structures of documents.

Despite the progress made by these three initiatives, there is still
much work to be done before there are widely accepted (standardized)
discourse tagging schemes suitable for sharing and distribution across
sites and projects. Moreover, there has not yet been an open forum to
which researchers working in this area could participate. This
workshop will provide such a forum.

Submissions are invited on, but not limited to, the following topics
and issues:

1. How can standardization for discourse tagging concretely be
achieved? By developing a single coding scheme, or a set of coding
schemes, one for each phenomenon of interest? Or rather, by developing
some specification guidelines and mappings from one scheme to another?
In some other way?

2. Cross-level coding: All of the initiatives mentioned above promote
an approach in which coding schemes are developed at different levels,
rather than an approach in which a monolithic scheme addresses all
phenomena. Given this methodology, the issue of cross-level coding
arises, namely, how can coding schemes for different levels take
advantage of each other and allow coding of cross-level relationships?
Is it possible to use corpus annotations at different annotation
levels to examine the interdependence of linguistic phenomena?

3. Coding schemes and theories of discourse: Is it possible to develop
coding schemes that faithfully reflect a discourse theory? If yes,
is it desirable? Conversely, can corpora coded for discourse issues
help advance our theoretical understanding of discourse phenomena?

4. Coding schemes and applications: Is it possible to design discourse
coding schemes independently from the applications that the tagged
corpora may be used to inform (e.g., to train a speech act
classifier)?

5. Coding schemes and reliability: Thus far, experience in developing
schemes for discourse phenomena that can be coded reliably has been
mixed. Whatever the reason (e.g., lack of an overarching theory for
discourse, genuine ambiguity and misunderstandings in real dialogue
reflected in the coding, etc), how can we devise reliable coding
schemes? What reliability measures should be used: are widely used
measures (Kappa, Alpha, precision and recall) and the corresponding
standards appropriate for discourse tagging? If not, what other
measures can we use? Is reliability affected by whether naive or
expert coders are used?

6.Tools for discourse tagging: What specific features of a tool does
discourse tagging require? Can we just extend tools developed for
other purposes, e.g. for syntactic tagging? Do we need to develop new
tools?

7. Some paradigms for evaluating dialogue systems take advantage of
the use of tagged corpora: How are discourse tagging and tagging for
evaluation purposes related? Are there some discourse tags that may be
used as evaluation tags or is it advisable to introduce another
dimension of tagging?

In addition to papers, prospective participants may be asked to do a
small coding exercise before the workshop, in order to test out
various tagging schemes. Prospective participants who have developed
tools are welcome to bring a demo with them.


FORMAT FOR SUBMISSION

Authors are requested to submit an electronic version of their
papers. Send your electronic submission to both Marilyn Walker
(walkerresearch.att.com) and Morena Danieli (danielicselt.it). If
electronic submission is impossible, please contact the organizers to
arrange for hardcopy submission (four hardcopies will be required).
Maximum length is 6 pages including figures and references.

Please conform with the traditional two-column ACL Proceedings
format. Style files can be downloaded from
ftp://ftp.cs.columbia.edu/acl-l/Styfiles/Proceedings/


IMPORTANT DATES

Paper submission deadline: March 26
Notification of acceptance: April 16
Camera ready papers due: April 30

ORGANIZING COMMITTEE

Marilyn Walker (Contact Person)
ATT Labs - Research
180 Park Ave
Rm. E-103
Florham Park, N.J. 07932, USA
walkerresearch.att.com
+1-973-360-8956

Morena Danieli (Contact Person)
CSELT-Centro Studi E Laboratori Telecomunicazioni
CF/VR
Via Reiss-Romoli, 274
I-10148 Torino, Italia
Morena.Danielicselt.it
+39-011-2286247

Johanna D. Moore
University of Edinburgh
Human Communication Research Centre
2, Buccleuch Place
Edinburgh EH8 9LW, UK
jmoorecogsci.ed.ac.uk
+44-131-6511336

Barbara Di Eugenio
Department of Electrical Engineering and Computer Science
Science and Engineering Offices
851 South Morgan Street (M/C 154)
Chicago, Illinois 60607-7053, USA
bdieugeneecs.uic.edu
+1-312-996-3422


PROGRAM COMMITTEE

Jean Carletta - HCRC, University of Edinburgh
Laila Dybkjaer - MIP, Odense University
Julia Hirschberg - AT&T
Diane Litman - AT&T
Masato Ishizaki - JAIST
David Novick - EURISCO
Silvia Quazza - CSELT
Daniel Jurafsky - University of Colorado
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Methods in Natural Language Processing

Date: Thu, 25 Mar 1999 01:15:46 +0800
From: Liu Xiaohu <lxiaohucs.ust.hk>
Subject: Methods in Natural Language Processing

 
 Final Call For Papers

 (EMNLP/VLC-99) JOINT SIGDAT CONFERENCE ON
 EMPIRICAL METHODS IN NATURAL LANGUAGE
 PROCESSING AND
 VERY LARGE CORPORA

 Sponsored by SIGDAT (ACL's Special Interest Group
for Linguistic Data and Corpus-based Approaches to NLP)

 June 21-22, 1999
 University of Maryland

 In conjunction with
 ACL'99: the 37th Annual Meeting of the
 Association for Computational
Linguistics

This SIGDAT-sponsored joint conference will continue to provide a forum
for new research in corpus-based
and/or empirical methods in NLP. In addition to providing a general
forum, the theme for this year is 

"Corpus-based and/or Empirical Methods in NLP for Speech, MT, IR, and
other Applied Systems" 

A large number of systems in automatic speech recognition(ASR) and
synthesis, machine translation(MT),
information retrieval(IR), optical character recognition(OCR) and
handwriting recognition have become
commercially available in the last decade. Many of these systems use
NLP technologies as an important
component. Corpus-based and empirical methods in NLP have been a major
trend in recent years. How useful
are these techniques when applied to real systems, especially when
compared to rule-based methods? Are there
any new techniques to be developed in EMNLP and from VLC in order to
improve the state-of-the-art of
ASR, MT, IR, OCR, and other applied systems? Are there new ways to
combine corpus-based and empirical
methods with rule-based systems? 

This two-day conference aims to bring together academic researchers and
industrial practitioners to discuss the
above issues, through technical paper sessions, invited talks, and panel
discussions. The goal of the conference is
to raise an awareness of what kind of new EMNLP techniques need to be
developed in order to bring about the
next breakthrough in speech recognition and synthesis, machine
translation, information retrieval and other
applied systems. 


Scope 

The conference solicits paper submissions in (and not limited to) the
following areas: 

1) Original work in one of the following technologies and its relevance
to speech, MT, or IR: 
 (a) word sense disambiguation 
 (b) word and term segmentation and extraction 
 (c) alignment 
 (d) bilingual lexicon extraction 
 (e) POS tagging 
 (f) statistical parsing 
 (g) dialog models 
 (h) others (please specify) 

2) Proposals of new EMNLP technologies for speech, MT, IR, OCR, or other
applied systems (please specify). 

3) Comparetive evaluation of the performance of EMNLP technologies in
one of the areas in (1) and that of its
rule-based or knowledge-based counterpart in a speech, MT, IR, OCR or
other applied system. 
 


Submission Requirements 

Submissions should be limited to original, evaluated work. All papers
should include background survey and/or
reference to previous work. The authors should provide explicit
explanation when there is no evaluation in their
work. We encourage paper submissions related to the conference theme. In
particular, we encourage the authors
to include in their papers, proposals and discussions of the relevance
of their work to the theme. However, there
will be a special session in the conference to include corpus-based
and/or empirical work in all areas of natural
language processing. 



Submission Format 

Only hard-copy submissions will be accepted. Reviewing of papers will
not be blind. The submission format and
word limit are the same as those for ACL this year. We strongly
recommend the use of ACL-standard LaTeX
(plus bibstyle and trivial example) or Word style files for the
preparation of submissions. Paper ID is not
required. Please leave it blank. Six opies of full-length paper (not to
exceed 3200 words exclusive of references)
should be received at the following address before or on March 31, 1999. 

EMNLP/VLC-99 Program Committee 
c/o Pascale Fung 
Department of Electrical and Electronic Engineering 
University of Science and Tehnology (HKUST) 
Clear Water Bay, Kowloon 
Hong Kong 



Important Dates 

March 31 Submission of full-length paper 
April 30 Acceptance notice 
May 20 Camera-ready paper due 
June 21-22 Conference date 



Program Chair

Pascale Fung 
Human Language Technology Center 
Department of Electrical and Electronic Engineering 
University of Science and Tehnology (HKUST) 
Clear Water Bay, Kowloon 
Hong Kong 
Tel: (+852) 2358 8537 
Fax: (+852) 2358 1485 
Email: pascaleee.ust.hk 
 
Program Co-Chair 
Joe Zhou 
LEXIS-NEXIS, a Division of Reed Elsevier 
9555 Springboro Pike 
Dayton, OH 45342 
USA 
Email: joezlexis-nexis.com

Program Committee 

Jiang-Shin Chang (Behavior Design Corp.) 
Ken Church (AT&T Labs--Research) 
Ido Dagan (Bar-Ilan University) 
Marti Hearst (UC-Berkeley) 
Huang, Changning (Tsinghua University) 
Pierre Isabelle (Xerox Research Europe) 
Lillian Lee (Cornell University) 
David Lewis (AT&T Research) 
Dan Melamed (West Group) 
Mehryar Mohri (AT&T Labs--Research) 
Masaaki Nagata (NTT) 
Richard Sproat (AT&T Labs--Research) 
Andreas Stolcke (SRI) 
Ralph Weischedel (BBN) 
Dekai Wu (Hong Kong University of Science & Technology) 
David Yarowsky (Johns Hopkins University)
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue