LINGUIST List 29.4174

Fri Oct 26 2018

Calls: Comp Ling, Historical Ling, Text/Corpus Ling/Germany

Editor for this issue: Everett Green <>

Date: 24-Oct-2018
From: Eva Zehentner <>
Subject: (Semi-)automatic Retrieval of Data from Historical Corpora
E-mail this message to a friend

Full Title: (Semi-)automatic Retrieval of Data from Historical Corpora

Date: 21-Aug-2019 - 24-Aug-2019
Location: Leipzig, Germany
Contact Person: Eva Zehentner
Meeting Email: < click here to access email >

Linguistic Field(s): Computational Linguistics; Historical Linguistics; Text/Corpus Linguistics

Language Family(ies): Germanic

Call Deadline: 12-Nov-2018

Meeting Description:

(Session of 52nd Annual Meeting of the Societas Linguistica Europaea)

Convenors: Marianne Hundt, Melanie Röthlisberger, Gerold Schneider and Eva Zehentner

Developments in historical corpus linguistics have taken a similar route as in corpus-based research on present-day languages: from the creation of small reference corpora to increasingly larger databases and from text-only to richly annotated resources. However, historical data have always posed particular challenges for the development of corpus resources, their annotation, and their analysis. Corpus representativeness and balancedness, for instance, has been impaired by the limited availability of texts, particularly for the very early stages of written attestation. Additionally, the highly variable orthography typical of earlier texts has meant that the tools developed for more uniform data cannot be applied in a straightforward manner to historical corpora. In the case of smaller corpora, this has resulted in grammatical annotation through manual annotation or post-editing For the increasingly larger resources, however, manual annotation is tedious, and researchers have developed tools for pre-processing like spelling normalisation (Baron and Rayson 2008) and lemmatisation (Burns 2013) to enable automatic tagging and parsing. Matters are complicated further by the fact that a range of different annotated resources exist (Penn Treebank, Penn Parsed Corpora, Universal Dependency Treebanks) and different parsing tools (e.g. Schneider 2012) have been applied to historical corpora, which are likely to require different retrieval strategies, which in turn make comparisons across corpora difficult. While the list of syntactic parsers is large (e.g. Schneider (2008) for English, Sennrich et al. (2009) for German, van Noord (2006) for Dutch, Alberti et al. 2017 for Universal Dependency parsing), few have been used on, or adapted to historical texts.

The aim of this workshop is to focus on the challenges that (semi-)automatic retrieval of data from historical corpora pose for the study of grammatical change, specifically in English, German, and Dutch. In particular, we invite contributions on topics such as (but not limited to) the following:

- mapping of different annotation schemes
- evaluation of bottom-up approaches to data retrieval for language change
- issues of precision and recall in historical corpora

Ultimately, this workshop seeks to provide a platform for researchers working within these subject areas to exchange ideas and to jointly address the challenges (and chances) we are faced with.

Call for Papers:

We invite researchers to submit an anonymised abstract of 300 words (excluding references) to by November 12, 2018. Talks will be 20 minutes each, with 5 minutes for discussion and 5 minutes for speaker change. The workshop will start with an introduction by the organisers, who will summarise previous research, the research questions addressed in the workshop and the scope of the papers to be presented. The workshop will be concluded with a final discussion.

The workshop proposal to be submitted to the SLE organisers will include all participants’ abstracts. Notification of acceptance/rejection of the workshop proposal by the SLE will be given by 15 December 2018. If our workshop proposal is accepted, we will invite all preliminary workshop participants to submit their full abstracts by 15 January 2017 to the general call for papers for review.

Page Updated: 26-Oct-2018