Editor for this issue: Jody Huellmantel <jody
linguistlist.org>
IRCS WORKSHOP ON LINGUISTIC DATABASES University of Pennsylvania Philadelphia, USA 11-13 December 2001 http://www.ldc.upenn.edu/annotation/database/ Sponsored by the National Science Foundation and the Institute for Research in Cognitive Science Organized by: Steven Bird, Peter Buneman and Mark Liberman Department of Computer and Information Science, Department of Linguistics, and the Linguistic Data Consortium University of Pennsylvania Linguistic databases are digital repositories of structured information intended to document natural language and natural communicative interaction. Over the last decade, linguistic databases have come to stand at the center of empirical research in the language sciences, and in the development of new human language technologies. Like genomic databases, linguistic databases are complex, evolving and richly annotated repositories, and pose interesting challenges for efficient representation, indexing and query. And like most scientific databases, linguistic databases have made little use of standard database technology. The goals of the workshop are to take stock of existing research in linguistic databases, to identify the key problems, and to explore applications of current database research to these problems. More broadly, the workshop will help define the research questions of a new "linguistic database community" and initiate the ongoing interchange of relevant problems and results between this community and the database community at large. The workshop will address a selection of the following topics: MODELS: * models for text databases, speech databases, multimodal databases, typological databases, geographical databases (language maps), and metadata repositories * relational, object-oriented and semi-structured models for representing linguistic annotations * representations for specific linguistic datatypes (e.g. databases of aligned parallel text) * modelling temporal and (geo)spatial structure * critical analysis of existing linguistic databases LANGUAGES: * query of multilayer annotations * linguistic applications/extensions of XML query languages * analysis of existing ad hoc query languages * queries over temporal and (geo)spatial structure OTHER TOPICS: * database support (e.g. what standard database technology has proven worthwhile for linguistic databases?) * appropriate indexing methods for linguistic strings and structures * archiving and preservation * metadata standards serving as finding aids for linguistic databases * data provenance / data lineage * annotation servers Provisional Timetable Call for papers: posted in May Extended abstracts: due in August Final papers: due in November Website and Mailing List Subsequent announcements will be posted to this list, and on the workshop website: http://www.ldc.upenn.edu/annotation/database/ Steven Bird, Peter Buneman and Mark Liberman - Steven Bird http://www.ldc.upenn.edu/sb/ sbMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueldc.upenn.edu Peter Buneman http://www.cis.upenn.edu/~peter/ peter
cis.upenn.edu Mark Liberman http://www.ldc.upenn.edu/myl/ myl
unagi.cis.upenn.edu