LINGUIST List 11.727

Thu Mar 30 2000

Confs: Large Corpora and Annotation Standards

Editor for this issue: Jody Huellmantel <>

Please keep conferences announcement as short as you can; LINGUIST will not post conference announcements which in our opinion are excessively long.


  1. Nancy M. Ide, Large Corpora and Annotation Standards

Message 1: Large Corpora and Annotation Standards

Date: Thu, 30 Mar 2000 17:27:47 -0500
From: Nancy M. Ide <>
Subject: Large Corpora and Annotation Standards

 Large Corpora and Annotation Standards

 Held in conjunction with ANLP/NAACL'00
 Seattle, Washington
 4 May 2000 1-6pm 

 This meeting is intended to bring together researchers and
 developers from a variety of domains in text, speech,
 video, etc., to look broadly at the technical issues that
 bear on the development of software systems and standards
 for the annotation and exploitation of linguistic
 resources. The goal is to lay the groundwork for the
 definition of a data and system architecture to support
 corpus annotation and exploitation that can be widely
 adopted within the community.

 Among the issues to be addressed are: 

 - layered data architectures 
 - system architectures for distributed databases 
 - support for plurality of annotation schemes 
 - impact and use of XML/XSL 
 - support for multimedia, including speech and video 
 - tools for creation, annotation, query and access of
 - mechanisms for linkage of annotation and primary
 - applicability of semi-structured data models, search
 and query systems, etc. 
 - evaluation/validation of systems and annotations 

 The motivation for this meeting is the American National
 Corpus (ANC) effort, which should begin corpus creation
 within the year. We anticipate that the ANC will provide a
 significant resource for natural language processing, and
 we therefore seek to identify state-of-the-art methods for
 its creation, annotation, and exploitation. Also, as a
 national and freely available resource, the data and system
 architecture of the ANC is likely to become a de facto
 standard. We therefore hope to draw together leading
 researchers and developers to establish a basis for the
 design of a system to support the creation and use of the

 Provisional Program 

 Overview of the American National Corpus Effort 
 Nancy Ide and Catherine Macleod 

 Searching Linguistically Annotated Corpora 
 Chris Brew 

 Considerations for Large Corpus Annotation:
 Intercoder Reliability 
 Rebecca Bruce and Janyce Wiebe 

 The XML Framework and Its Implications for Large
 Corpus Access 
 Nancy Ide 

 The ATLAS System 
 John Henderson 

 Annotation Standards and Their Impact on Large
 Corpus Development 
 Nicoletta Calzolari 

 A Framework for Multi-level Linguistic Annotation 
 Patrice Lopez and Laurent Romary 

 Discussion : Requirements for the ANC

 A related workshop will be held at the LREC conference on
 May 29-30, 2000.


 Nancy Ide 
 Professor and Chair
 Department of Computer Science 
 Vassar College 
 Poughkeepsie, NY 12604-0520 USA 
 Tel: +1 914 437-5988 Fax: +1 914 437-7498
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue