LINGUIST List 25.2982

Mon Jul 21 2014

Confs: Computational Linguistics, Text/Corpus Linguistics/USA

Editor for this issue: Anna White <>

Date: 18-Jul-2014
From: Francisco Ordonez <>
Subject: Workshop on Databases and Corpora in Linguistics
E-mail this message to a friend

Workshop on Databases and Corpora in Linguistics

Date: 17-Oct-2014 - 17-Oct-2014
Location: Stony Brook, New York, USA
Contact: Lori Repetti
Contact Email: < click here to access email >

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics

Meeting Description:

Workshop on Databases and Corpora in Linguistics
Friday, October 17, 2014
Stony Brook University

Databases and corpora are tools of increasing importance in the field of Linguistics. There are hundreds of electronic resources now being developed which span the fields of historical linguistics, documenting endangered languages, language acquisition, lexicography, comparative grammar, etc. They allow millions of bits of linguistic information to be available to researchers for comparative study, in-depth analysis, frequency investigations, etc.

As with all tools, databases and corpora are successful only if people know how to use them and make good use of them. The design and functionality of the many linguistic databases and corpora available vary greatly. Many are user-friendly, and others require considerable training in order to access their wealth of information. With this proliferation of digital resources, the time is right to share our experiences with each other.

We are planning a 'Workshop on Databases and Corpora in Linguistics' to be held at Stony Brook University on Friday, October 17, 2014, on 'best practices' in design and use of linguistic databases and corpora. The day-long event will consist of demonstrations and critical assessments of existing databases.

Invited Speakers:

- Prof. Pilar Prieto of the Universitat Pompeu Fabra has developed the Interactive Atlas of Spanish Intonation ( and the Interactive Atlas of Romance Intonation ( which provide phonological (and also syntactic) information on different intonational patterns in Romance languages.

- Prof. Christina Tortora, CUNY (College of Staten Island the Graduate Center) is creating (in collaboration with others) the Audio Aligned and Parsed Corpus of Appalachian English (, a publicly available corpus that will include digitized versions of existing oral histories and searchable time-aligned transcripts annotated with grammatical information.


Workshop on Databases and Corpora in Linguistics
Stony Brook University
SAC 305

Friday, October 17, 2014

Welcome and breakfast

Invited speaker: Pilar Prieto (Universitat Pompeu Fabra)
“Comparative Romance Intonation and the Corpora of the Interactive Atlases of Intonation”

Jiwon Yun (Stony Brook University)
“Korean Speech Corpus and Tools for Linguistic Research”

Barbara Bullock, Jacqueline Toribio, Jacqueline Serigos (University of Texas, Austin)
“The Spanish in Texas Corpus: An open source approach to a contact variety”

Lori Repetti and Francisco Ordóñez (Stony Brook University)
“Clitics of Romance Languages Database”

Christina Hagedorn (USC)
“USC–TIMIT: A multimodal speech production database”

12:00–1:30 Lunch

Invited speaker: Christina Tortora (College of Staten Island and The Graduate Center, City University of New York)
“The Audio–Aligned and Parsed Corpus of Appalachian English: Design and Use”

Richard Zimmermann (University of Geneva)
“New Approaches to Syntactic Annotation: Constructing a Parsed Corpus of Early High German”

Ken Safir (Rutgers University)
“The Afranaph Project online database for the study of African Languages”

Ángel J. Gallego (Universitat Autònoma de Barcelona), Francisco Ordóñez (Stony Brook University), Francesc Roca (Universitat de Girona)
“Towards a Syntactic Atlas of Spanish and its Dialects”

Rachael Tatman (University of Washington)
“The SLAY Database: A Meta–Analytic Database of Sign Language Grammars”

4:30–5:00 Coffee break

Panel discussion

7:00 Dinner

Page Updated: 21-Jul-2014