Editor for this issue: Karen Milligan <karen
linguistlist.org>
TOWARDS STANDARDS AND TOOLS FOR DISCOURSE TAGGING (ACL-99 Workshop ) June 22, 1999 University of Maryland College Park, MD, USA URL: http://www.mri.mq.edu.au/conf/acl99/ DESCRIPTION Discourse tagging assigns labels from a tag set to discourse units in texts or dialogues. The discourse units range from words and phrases, such as referring expressions, to multi-utterance units identified by criteria such as speaker intention or initiative. Just as the availability of syntactically annotated corpora has resulted in major advances in sentence-level natural language processing, we expect that corpora tagged for discourse features will lead to similar advances in discourse processing. Work on discourse tagging has gained momentum in the last 3-4 years. Three major initiatives in this area are: the Discourse Resource Initiative (http://www.georgetown.edu/luperfoy/Discourse-Treebank/), that has organized yearly international workshops addressing the standardization of discourse tagging schemes for coreference, for dialogue acts, and for higher level discourse structures; MATE (http://mate.mip.ou.dk/), a project co-funded by the European Union, whose aim is to develop tools and standards for tagging spoken dialogue corpora at different levels, including the discourse level; the Global Document Annotation initiative (http://www.etl.go.jp/etl/nl/GDA), that aims at having Internet authors annotate their documents with a common standard tag set which allows machines to recognize the semantic and pragmatic structures of documents. Despite the progress made by these three initiatives, there is still much work to be done before there are widely accepted (standardized) discourse tagging schemes suitable for sharing and distribution across sites and projects. Moreover, there has not yet been an open forum to which researchers working in this area could participate. This workshop will provide such a forum. Submissions are invited on, but not limited to, the following topics and issues: 1. How can standardization for discourse tagging concretely be achieved? By developing a single coding scheme, or a set of coding schemes, one for each phenomenon of interest? Or rather, by developing some specification guidelines and mappings from one scheme to another? In some other way? 2. Cross-level coding: All of the initiatives mentioned above promote an approach in which coding schemes are developed at different levels, rather than an approach in which a monolithic scheme addresses all phenomena. Given this methodology, the issue of cross-level coding arises, namely, how can coding schemes for different levels take advantage of each other and allow coding of cross-level relationships? Is it possible to use corpus annotations at different annotation levels to examine the interdependence of linguistic phenomena? 3. Coding schemes and theories of discourse: Is it possible to develop coding schemes that faithfully reflect a discourse theory? If yes, is it desirable? Conversely, can corpora coded for discourse issues help advance our theoretical understanding of discourse phenomena? 4. Coding schemes and applications: Is it possible to design discourse coding schemes independently from the applications that the tagged corpora may be used to inform (e.g., to train a speech act classifier)? 5. Coding schemes and reliability: Thus far, experience in developing schemes for discourse phenomena that can be coded reliably has been mixed. Whatever the reason (e.g., lack of an overarching theory for discourse, genuine ambiguity and misunderstandings in real dialogue reflected in the coding, etc), how can we devise reliable coding schemes? What reliability measures should be used: are widely used measures (Kappa, Alpha, precision and recall) and the corresponding standards appropriate for discourse tagging? If not, what other measures can we use? Is reliability affected by whether naive or expert coders are used? 6.Tools for discourse tagging: What specific features of a tool does discourse tagging require? Can we just extend tools developed for other purposes, e.g. for syntactic tagging? Do we need to develop new tools? 7. Some paradigms for evaluating dialogue systems take advantage of the use of tagged corpora: How are discourse tagging and tagging for evaluation purposes related? Are there some discourse tags that may be used as evaluation tags or is it advisable to introduce another dimension of tagging? In addition to papers, prospective participants may be asked to do a small coding exercise before the workshop, in order to test out various tagging schemes. Prospective participants who have developed tools are welcome to bring a demo with them. FORMAT FOR SUBMISSION Authors are requested to submit an electronic version of their papers. Send your electronic submission to both Marilyn Walker (walkerMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueresearch.att.com) and Morena Danieli (danieli
cselt.it). If electronic submission is impossible, please contact the organizers to arrange for hardcopy submission (four hardcopies will be required). Maximum length is 6 pages including figures and references. Please conform with the traditional two-column ACL Proceedings format. Style files can be downloaded from ftp://ftp.cs.columbia.edu/acl-l/Styfiles/Proceedings/ IMPORTANT DATES Paper submission deadline: March 26 Notification of acceptance: April 16 Camera ready papers due: April 30 ORGANIZING COMMITTEE Marilyn Walker (Contact Person) ATT Labs - Research 180 Park Ave Rm. E-103 Florham Park, N.J. 07932, USA walker
research.att.com +1-973-360-8956 Morena Danieli (Contact Person) CSELT-Centro Studi E Laboratori Telecomunicazioni CF/VR Via Reiss-Romoli, 274 I-10148 Torino, Italia Morena.Danieli
cselt.it +39-011-2286247 Johanna D. Moore University of Edinburgh Human Communication Research Centre 2, Buccleuch Place Edinburgh EH8 9LW, UK jmoore
cogsci.ed.ac.uk +44-131-6511336 Barbara Di Eugenio Department of Electrical Engineering and Computer Science Science and Engineering Offices 851 South Morgan Street (M/C 154) Chicago, Illinois 60607-7053, USA bdieugen
eecs.uic.edu +1-312-996-3422 PROGRAM COMMITTEE Jean Carletta - HCRC, University of Edinburgh Laila Dybkjaer - MIP, Odense University Julia Hirschberg - AT&T Diane Litman - AT&T Masato Ishizaki - JAIST David Novick - EURISCO Silvia Quazza - CSELT Daniel Jurafsky - University of Colorado
Final Call For Papers (EMNLP/VLC-99) JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA Sponsored by SIGDAT (ACL's Special Interest Group for Linguistic Data and Corpus-based Approaches to NLP) June 21-22, 1999 University of Maryland In conjunction with ACL'99: the 37th Annual Meeting of the Association for Computational Linguistics This SIGDAT-sponsored joint conference will continue to provide a forum for new research in corpus-based and/or empirical methods in NLP. In addition to providing a general forum, the theme for this year is "Corpus-based and/or Empirical Methods in NLP for Speech, MT, IR, and other Applied Systems" A large number of systems in automatic speech recognition(ASR) and synthesis, machine translation(MT), information retrieval(IR), optical character recognition(OCR) and handwriting recognition have become commercially available in the last decade. Many of these systems use NLP technologies as an important component. Corpus-based and empirical methods in NLP have been a major trend in recent years. How useful are these techniques when applied to real systems, especially when compared to rule-based methods? Are there any new techniques to be developed in EMNLP and from VLC in order to improve the state-of-the-art of ASR, MT, IR, OCR, and other applied systems? Are there new ways to combine corpus-based and empirical methods with rule-based systems? This two-day conference aims to bring together academic researchers and industrial practitioners to discuss the above issues, through technical paper sessions, invited talks, and panel discussions. The goal of the conference is to raise an awareness of what kind of new EMNLP techniques need to be developed in order to bring about the next breakthrough in speech recognition and synthesis, machine translation, information retrieval and other applied systems. Scope The conference solicits paper submissions in (and not limited to) the following areas: 1) Original work in one of the following technologies and its relevance to speech, MT, or IR: (a) word sense disambiguation (b) word and term segmentation and extraction (c) alignment (d) bilingual lexicon extraction (e) POS tagging (f) statistical parsing (g) dialog models (h) others (please specify) 2) Proposals of new EMNLP technologies for speech, MT, IR, OCR, or other applied systems (please specify). 3) Comparetive evaluation of the performance of EMNLP technologies in one of the areas in (1) and that of its rule-based or knowledge-based counterpart in a speech, MT, IR, OCR or other applied system. Submission Requirements Submissions should be limited to original, evaluated work. All papers should include background survey and/or reference to previous work. The authors should provide explicit explanation when there is no evaluation in their work. We encourage paper submissions related to the conference theme. In particular, we encourage the authors to include in their papers, proposals and discussions of the relevance of their work to the theme. However, there will be a special session in the conference to include corpus-based and/or empirical work in all areas of natural language processing. Submission Format Only hard-copy submissions will be accepted. Reviewing of papers will not be blind. The submission format and word limit are the same as those for ACL this year. We strongly recommend the use of ACL-standard LaTeX (plus bibstyle and trivial example) or Word style files for the preparation of submissions. Paper ID is not required. Please leave it blank. Six opies of full-length paper (not to exceed 3200 words exclusive of references) should be received at the following address before or on March 31, 1999. EMNLP/VLC-99 Program Committee c/o Pascale Fung Department of Electrical and Electronic Engineering University of Science and Tehnology (HKUST) Clear Water Bay, Kowloon Hong Kong Important Dates March 31 Submission of full-length paper April 30 Acceptance notice May 20 Camera-ready paper due June 21-22 Conference date Program Chair Pascale Fung Human Language Technology Center Department of Electrical and Electronic Engineering University of Science and Tehnology (HKUST) Clear Water Bay, Kowloon Hong Kong Tel: (+852) 2358 8537 Fax: (+852) 2358 1485 Email: pascaleMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueee.ust.hk Program Co-Chair Joe Zhou LEXIS-NEXIS, a Division of Reed Elsevier 9555 Springboro Pike Dayton, OH 45342 USA Email: joez
lexis-nexis.com Program Committee Jiang-Shin Chang (Behavior Design Corp.) Ken Church (AT&T Labs--Research) Ido Dagan (Bar-Ilan University) Marti Hearst (UC-Berkeley) Huang, Changning (Tsinghua University) Pierre Isabelle (Xerox Research Europe) Lillian Lee (Cornell University) David Lewis (AT&T Research) Dan Melamed (West Group) Mehryar Mohri (AT&T Labs--Research) Masaaki Nagata (NTT) Richard Sproat (AT&T Labs--Research) Andreas Stolcke (SRI) Ralph Weischedel (BBN) Dekai Wu (Hong Kong University of Science & Technology) David Yarowsky (Johns Hopkins University)