LINGUIST List 11.842

Wed Apr 12 2000

Confs: Large Corpora & Annotation Standards-ANLP/NAACL

Editor for this issue: Jody Huellmantel <>

Please keep conferences announcement as short as you can; LINGUIST will not post conference announcements which in our opinion are excessively long.


  1. Priscilla Rasmussen, Large Corpora & Annotation Standards at ANLP/NAACL2000

Message 1: Large Corpora & Annotation Standards at ANLP/NAACL2000

Date: Tue, 11 Apr 2000 17:54:28 EDT
From: Priscilla Rasmussen <>
Subject: Large Corpora & Annotation Standards at ANLP/NAACL2000

 Large Corpora and Annotation Standards

 Held in conjunction with ANLP/NAACL'00
 Applied NaturalLanguage Processing and the 
 North American Chapter of the Association 
 for Computational Linguistics

 Seattle, Washington
 4 May 2000 1-6pm

 This meeting is intended to bring together researchers and
 developers from a variety of domains in text, speech,
 video, etc., to look broadly at the technical issues that
 bear on the development of software systems and standards
 for the annotation and exploitation of linguistic
 resources. The goal is to lay the groundwork for the
 definition of a data and system architecture to support
 corpus annotation and exploitation that can be widely
 adopted within the community.

 Among the issues to be addressed are:

 - layered data architectures
 - system architectures for distributed databases
 - support for plurality of annotation schemes
 - impact and use of XML/XSL
 - support for multimedia, including speech and video
 - tools for creation, annotation, query and access
 - of corpora
 - mechanisms for linkage of annotation and primary
 - applicability of semi-structured data models,
 - search and query systems, etc.
 - evaluation/validation of systems and annotations

 The motivation for this meeting is the American National
 Corpus (ANC) effort, which should begin corpus creation
 within the year. We anticipate that the ANC will provide a
 significant resource for natural language processing, and
 we therefore seek to identify state-of-the-art methods for
 its creation, annotation, and exploitation. Also, as a
 national and freely available resource, the data and
 system architecture of the ANC is likely to become a de
 facto standard. We therefore hope to draw together leading
 researchers and developers to establish a basis for the
 design of a system to support the creation and use of the

 Provisional Program

 Overview of the American National Corpus Effort
 Nancy Ide and Catherine Macleod

 Searching Linguistically Annotated Corpora
 Chris Brew

 Considerations for Large Corpus Annotation:
 Intercoder Reliability
 Rebecca Bruce and Janyce Wiebe

 The XML Framework and Its Implications for Large
 Corpus Access
 Nancy Ide

 The ATLAS System
 John Henderson

 Annotation Standards and Their Impact on Large
 Corpus Development
 Nicoletta Calzolari

 A Framework for Multi-level Linguistic Annotation
 Patrice Lopez and Laurent Romary

 Discussion : Requirements for the ANC

 A related workshop will be held at the LREC conference on
 May 29-30, 2000. See


 Nancy Ide
 Professor and Chair
 Department of Computer Science
 Vassar College
 Poughkeepsie, NY 12604-0520 USA
 Tel: +1 914 437-5988 Fax: +1 914 437-7498
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue