LINGUIST List 26.1006

Thu Feb 19 2015

Calls: Text/Corpus Linguistics/UK

Editor for this issue: Anna White <awhitelinguistlist.org>


Date: 17-Feb-2015
From: Piotr Banski <banskiids-mannheim.de>
Subject: 3rd Meeting of the Workshop on Challenges in the Management of Large Corpora
E-mail this message to a friend

Full Title: 3rd Meeting of the Workshop on Challenges in the Management of Large Corpora
Short Title: CMLC-3

Date: 20-Jul-2015 - 20-Jul-2015
Location: Lancaster, United Kingdom
Contact Person: Piotr Banski
Meeting Email: < click here to access email >
Web Site: http://corpora.ids-mannheim.de/cmlc.html

Linguistic Field(s): Text/Corpus Linguistics

Call Deadline: 22-Mar-2015

Meeting Description:

This half-day workshop will gather the leading researchers in the field of Language Resource creation and Corpus Linguistics, in order to provide a platform for an intensive exchange of expertise, results and ideas, concerning topics revolving around the maintenance, curation, development and efficient use of large, structured, annotated corpus resources.

Call for Papers:

The third edition of CMLC will accompany Corpus Linguistics 2015 in Lancaster, and will be held on the 20 July 2015. This half-day workshop will gather the leading researchers in the field of Language Resource creation and Corpus Linguistics, in order to provide a platform for an intensive exchange of expertise, results and ideas, in particular concerning the following topics:

- Recent developments in ongoing web-as-corpus initiatives, national corpora, reference corpora, and other very large corpora
- Evaluation and investigation of the properties of large corpora
- Extraction, representation, and management of metadata
- Virtualization / techniques for drawing and accessing stratified virtual corpora
- Increasing the coverage of underrepresented strata
- Legal issues including license models and license management
- Acquisition and curation of large text archives from third parties
- Legal and technological issues of corpora physically distributed over different locations
- System- and database architectures for very large semi-structured data sets
- Heavily annotated corpora
- Use of annotation standards for large data sets
- Issues of interoperability and tool chaining
- Interfaces for user-provided annotations
- Quality control of annotations in large data sets
- Dealing with efficient and scalable user interfaces
- Effective querying of large corpora with multiple annotation layers
- Effective techniques for analyzing corpus data
- Strategies and techniques for maximizing recall and coping with large numbers of false positives
- Visualization and other techniques that facilitate the linking between quantitative investigations and qualitative interpretations
- “Put the computation near the data” as a strategy for dealing with IPR restrictions
- Open-source software and open-data corpora strategies
- Other issues that arise in the context of management of large datasets

We invite extended abstracts (up to 4 pages standard size, references excluded, exclusively as PDF) addressing some of the topics listed above.

Submission deadline: 22 March, midnight GMT
Submission address: http://linguistlist.org/easyabs/cmlc-2015

A volume of proceedings is planned.

The home page of CMLC events is located at http://corpora.ids-mannheim.de/cmlc.html

Organizing Committee:

- Piotr Bański, Marc Kupietz, Harald Lüngen, Andreas Witt (Institut für Deutsche Sprache, Mannheim)
- Hanno Biber, Evelyn Breiteneder (Institute for Corpus Linguistics and Text Technology, Vienna)

Programme Committee:

(This is a list of the colleagues who have confirmed their participation so far.)

- Damir Ćavar (Indiana University, Bloomington)
- Isabella Chiari (Sapienza University of Rome)
- Dan Cristea (''Alexandru Ioan Cuza'' University of Iasi)
- Václav Cvrček (Charles University Prague)
- Mark Davies (Brigham Young University)
- Tomaž Erjavec (Jožef Stefan Institute)
- Alexander Geyken (Berlin-Brandenburgische Akademie der Wissenschaften)
- Andrew Hardie (Lancaster University)
- Serge Heiden (ENS de Lyon)
- Nancy Ide (Vassar College)
- Miloš Jakubíček (Lexical Computing Ltd.)
- Adam Kilgarriff (Lexical Computing Ltd.)
- Krister Lindén (University of Helsinki)
- Martin Mueller (Northwestern University)
- Nelleke Oostdijk (Radboud University Nijmegen)
- Christian-Emil Smith Ore (University of Oslo)
- Piotr Pęzik (University of Łódź)
- Uwe Quasthoff (Leipzig University)
- Paul Rayson (Lancaster University)
- Laurent Romary (INRIA, DARIAH)
- Roland Schäfer (FU Berlin)
- Serge Sharoff (University of Leeds)
- Mária Simková (Slovak Academy of Sciences)
- Jörg Tiedemann (Uppsala University)
- Dan Tufiş (Romanian Academy, Bucharest)
- Tamás Váradi (Research Institute for Linguistics, Hungarian Academy of Sciences)

Please see http://corpora.ids-mannheim.de/cmlc.html for more information and the general updates.



Page Updated: 19-Feb-2015