LINGUIST List 10.336

Thu Mar 4 1999

FYI: Ling Annotation, SECOL Program, Lang Engineers

Editor for this issue: Scott Fults <>


  1. Steven Bird, Technical Report: A Formal Foundation for Linguistic Annotation
  2. JANET BING, SECOL Conference Program
  3. CRL SSLE, Computing Research Laboratory Summer School

Message 1: Technical Report: A Formal Foundation for Linguistic Annotation

Date: Wed, 3 Mar 1999 11:22:27 -0500 (EST)
From: Steven Bird <>
Subject: Technical Report: A Formal Foundation for Linguistic Annotation

Announcing a technical report on linguistic annotation. See below
for download information. Our apologies for any duplicate messages.

A Formal Framework for Linguistic Annotation
Steven Bird & Mark Liberman


`Linguistic annotation' covers any descriptive or analytic notations
applied to raw language data. The basic data may be in the form of
time functions - audio, video and/or physiological recordings - or it
may be textual. The added notations may include transcriptions of all
sorts (from phonetic features to discourse structures), part-of-speech
and sense tagging, syntactic analysis, `named entity' identification,
co-reference annotation, and so on. While there are several ongoing
efforts to provide formats and tools for such annotations and to
publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the
extent they exist, have focussed on file formats. This paper focuses
instead on the logical structure of linguistic annotations. We survey
a wide variety of existing annotation formats and demonstrate a common
conceptual core, the annotation graph. This provides a formal
framework for constructing, maintaining and searching linguistic
annotations, while remaining consistent with many alternative data
structures and file formats.

49pp, download from: []
Formats: PDF (336kb), Postscript (161kb), DVI (134kb), LaTeX (112kb)

For an online survey and extensive links, visit the
Linguistic Annotations Page: []

 author={Steven Bird and Mark Liberman},
 title={A Formal Framework for Linguistic Annotation},
 institution={Department of Computer and Information Science,
 University of Pennsylvania},

Please send comments to:,

Steven Bird & Mark Liberman

Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics
Linguistic Data Consortium, University of Pennsylvania
3615 Market St, Suite 200, Philadelphia, PA 19104-2608
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: SECOL Conference Program

Date: Wed, 3 Mar 1999 14:08:41 EST
Subject: SECOL Conference Program

The program for the April 1999 SECOL Conference in Norfolk, Virginia
is available at: 

Janet Bing 
Dept. of English
Old Dominion University
Norfolk, VA 23529-0078
(757) 683-4030
FAX (757) 683-3241
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Computing Research Laboratory Summer School

Date: Wed, 03 Mar 1999 15:08:08 -0700
From: CRL SSLE <>
Subject: Computing Research Laboratory Summer School

NMSU's Computing Research Laboratory Presents the
1999 Summer School in Language Engineering
June 28-July 9

The 1999 Summer School in Language Engineering is designed for the
practical computational linguist or natural language processing

The program of the school stresses practical needs of application system
builders in such areas as machine translation, information retrieval and
extraction or text summarization. It stresses the broad range of
multilingual aspects of todays language engineering, from the support
for the various writing systems to acquisition of linguistic knowledge
for applications to languages that have not yet been widely studied.

The summer school is organized by the Computing Research Laboratory
(CRL) of New Mexico State University. The instructors, both members of
CRL staff and visiting professors, are all leaders in their respective
areas of expertise.

The school will feature two full weeks of instruction and hands-on
practical studies. The number of students in the school will be small,
keeping a high instructor-to-student ratio. Registrants will be accepted
on a first-come first-served basis. Preregistration and fees must be
received no later than June 1.

For more information,
please visit our web site at:

Course Descriptions
"Ecological" Issues in Language
 This course will cover issues related to writing systems, encodings,
input and output methods; treatment of punctuation, special characters
and symbols, including mark-up; processing of dates and numbers; and a
variety of issues connected with managing large multilingual collections
of documents featuring different mark-up styles. A number of
computational tools will be introduced and used in practical exercises.

Approaches to Computational Morphology
After a presentation of several approaches to computational morphology,
with example systems for such widely different languages as Spanish,
Persian, Russian and Turkish, this course will concentrate on the
engineering of state-of-the-art morphological analysis and generation
systems, especially for languages other than English. Students will get
hands-on experience using sophisticated development and testing tools,
by building a morphological analyzer.

Lexicon Acquisition for NLP I: Morphology and Syntax
This course will describe the process of design and acquisition of
several types of lexicons for NLP systems: lexicons supporting
morphological and syntactic analysis of texts in a language, transfer
lexicons for machine translation and multilingual onomastica (lexicons
of proper names). A number of acquisition interfaces will be used in
practical exercises.

Lexicon Acquisition for NLP II: Ontological Semantics
This course will present the design and acquisition of static knowledge
sources to support analysis of meaning in natural language texts. In
particular, it will cover designing and building ontologies, or world
models, for NLP and lexicons for the support of semantic analysis of
particular languages. Practical exercises will be supported by
interactive acquisition interfaces.

Knowledge Elicitation from Informants
his course will present an environment for eliciting grammatical and
lexical knowledge about a language from a user who knows that language
and English but is not a trained linguist. This kind of environment is a
realistic alternative to experimenting with automatic elicitation of
language knowledge. It combines corpus-based, expectation-based and
failure-driven acquisition of declarative knowledge about a language and
is most useful for the languages for which few computational resources
are available. The design of the acquisition process and system will be
discussed, and the interface, Boas, will be used in practical exercises.

 Survey of Language Engineering Applications
This course will introduce language engineering applications such as
machine translation, information retrieval and extraction, text
summarization and language instruction. The tasks and techniques learned
in the other courses will be put in their context and further
illustrated. The following systems will be presented and available for
laboratory work: the Corelli machine translation environment; the MINDS
information retrieval and summarization system, the URSA cross-language
information retrieval engine, the Oleada language instruction
environment and translators tool set, the Mikrokosmos machine
translation system and the Expedition environment for configuring
machine translation systems for low-density languages.

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue