Editor for this issue: Jody Huellmantel <jody
linguistlist.org>
Large Corpora and Annotation Standards http://www.cs.vassar.edu/~ide/ANLP-NAACL2000.html Held in conjunction with ANLP/NAACL'00 Applied NaturalLanguage Processing and the North American Chapter of the Association for Computational Linguistics Seattle, Washington 4 May 2000 1-6pm This meeting is intended to bring together researchers and developers from a variety of domains in text, speech, video, etc., to look broadly at the technical issues that bear on the development of software systems and standards for the annotation and exploitation of linguistic resources. The goal is to lay the groundwork for the definition of a data and system architecture to support corpus annotation and exploitation that can be widely adopted within the community. Among the issues to be addressed are: - layered data architectures - system architectures for distributed databases - support for plurality of annotation schemes - impact and use of XML/XSL - support for multimedia, including speech and video - tools for creation, annotation, query and access - of corpora - mechanisms for linkage of annotation and primary data - applicability of semi-structured data models, - search and query systems, etc. - evaluation/validation of systems and annotations The motivation for this meeting is the American National Corpus (ANC) effort, which should begin corpus creation within the year. We anticipate that the ANC will provide a significant resource for natural language processing, and we therefore seek to identify state-of-the-art methods for its creation, annotation, and exploitation. Also, as a national and freely available resource, the data and system architecture of the ANC is likely to become a de facto standard. We therefore hope to draw together leading researchers and developers to establish a basis for the design of a system to support the creation and use of the ANC. Provisional Program Overview of the American National Corpus Effort Nancy Ide and Catherine Macleod Searching Linguistically Annotated Corpora Chris Brew Considerations for Large Corpus Annotation: Intercoder Reliability Rebecca Bruce and Janyce Wiebe The XML Framework and Its Implications for Large Corpus Access Nancy Ide The ATLAS System John Henderson Annotation Standards and Their Impact on Large Corpus Development Nicoletta Calzolari A Framework for Multi-level Linguistic Annotation Patrice Lopez and Laurent Romary Discussion : Requirements for the ANC A related workshop will be held at the LREC conference on May 29-30, 2000. See http://www.cs.vassar.edu/~ide/anc/lrec.html Organizer: Nancy Ide Professor and Chair Department of Computer Science Vassar College Poughkeepsie, NY 12604-0520 USA Tel: +1 914 437-5988 Fax: +1 914 437-7498 ideMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecs.vassar.edu