Editor for this issue: Scott Fults <scott
linguistlist.org>
Announcing a technical report on linguistic annotation. See below for download information. Our apologies for any duplicate messages. A Formal Framework for Linguistic Annotation Steven Bird & Mark Liberman Abstract `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions - audio, video and/or physiological recordings - or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats. 49pp, download from: [http://xxx.lanl.gov/abs/cs.CL/9903003] Formats: PDF (336kb), Postscript (161kb), DVI (134kb), LaTeX (112kb) For an online survey and extensive links, visit the Linguistic Annotations Page: [http://www.ldc.upenn.edu/annotation]Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueTechReport{BirdLiberman99, author={Steven Bird and Mark Liberman}, title={A Formal Framework for Linguistic Annotation}, institution={Department of Computer and Information Science, University of Pennsylvania}, year=1999, number={MS-CIS-99-01}, note={[xxx.lanl.gov/abs/cs.CL/9903003]} Please send comments to: sb
ldc.upenn.edu, myl
ldc.upenn.edu Regards, Steven Bird & Mark Liberman - Steven.Bird
ldc.upenn.edu http://www.ldc.upenn.edu/sb Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics Linguistic Data Consortium, University of Pennsylvania 3615 Market St, Suite 200, Philadelphia, PA 19104-2608
The program for the April 1999 SECOL Conference in Norfolk, Virginia is available at: http://www.odu.edu/~jpb/secol60.html Janet Bing Dept. of English Old Dominion University Norfolk, VA 23529-0078 (757) 683-4030 FAX (757) 683-3241Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
NMSU's Computing Research Laboratory Presents the 1999 Summer School in Language Engineering June 28-July 9 The 1999 Summer School in Language Engineering is designed for the practical computational linguist or natural language processing specialist. The program of the school stresses practical needs of application system builders in such areas as machine translation, information retrieval and extraction or text summarization. It stresses the broad range of multilingual aspects of todays language engineering, from the support for the various writing systems to acquisition of linguistic knowledge for applications to languages that have not yet been widely studied. The summer school is organized by the Computing Research Laboratory (CRL) of New Mexico State University. The instructors, both members of CRL staff and visiting professors, are all leaders in their respective areas of expertise. The school will feature two full weeks of instruction and hands-on practical studies. The number of students in the school will be small, keeping a high instructor-to-student ratio. Registrants will be accepted on a first-come first-served basis. Preregistration and fees must be received no later than June 1. For more information, please visit our web site at: http://crl.nmsu.edu/summerschool Course Descriptions "Ecological" Issues in Language This course will cover issues related to writing systems, encodings, input and output methods; treatment of punctuation, special characters and symbols, including mark-up; processing of dates and numbers; and a variety of issues connected with managing large multilingual collections of documents featuring different mark-up styles. A number of computational tools will be introduced and used in practical exercises. Approaches to Computational Morphology After a presentation of several approaches to computational morphology, with example systems for such widely different languages as Spanish, Persian, Russian and Turkish, this course will concentrate on the engineering of state-of-the-art morphological analysis and generation systems, especially for languages other than English. Students will get hands-on experience using sophisticated development and testing tools, by building a morphological analyzer. Lexicon Acquisition for NLP I: Morphology and Syntax This course will describe the process of design and acquisition of several types of lexicons for NLP systems: lexicons supporting morphological and syntactic analysis of texts in a language, transfer lexicons for machine translation and multilingual onomastica (lexicons of proper names). A number of acquisition interfaces will be used in practical exercises. Lexicon Acquisition for NLP II: Ontological Semantics This course will present the design and acquisition of static knowledge sources to support analysis of meaning in natural language texts. In particular, it will cover designing and building ontologies, or world models, for NLP and lexicons for the support of semantic analysis of particular languages. Practical exercises will be supported by interactive acquisition interfaces. Knowledge Elicitation from Informants his course will present an environment for eliciting grammatical and lexical knowledge about a language from a user who knows that language and English but is not a trained linguist. This kind of environment is a realistic alternative to experimenting with automatic elicitation of language knowledge. It combines corpus-based, expectation-based and failure-driven acquisition of declarative knowledge about a language and is most useful for the languages for which few computational resources are available. The design of the acquisition process and system will be discussed, and the interface, Boas, will be used in practical exercises. Survey of Language Engineering Applications This course will introduce language engineering applications such as machine translation, information retrieval and extraction, text summarization and language instruction. The tasks and techniques learned in the other courses will be put in their context and further illustrated. The following systems will be presented and available for laboratory work: the Corelli machine translation environment; the MINDS information retrieval and summarization system, the URSA cross-language information retrieval engine, the Oleada language instruction environment and translators tool set, the Mikrokosmos machine translation system and the Expedition environment for configuring machine translation systems for low-density languages. http://crl.nmsu.edu/summerschoolMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue