|Full Title:||Computational Historical Linguistics|
|Start Date:||22-May-2013 - 22-May-2013|
|Meeting Email:||click here to access email|
|Meeting Description:||Recent years have seen a surge of interest in the application of computational methods to problems in historical linguistics. To date, much of this work has been based on the application of simple similarity measures to short lists of lexical items or grammatical features for achieving large-scale genetic grouping of languages. While highly publicized and demonstrably useful, such approaches are inherently limited both by the narrow range of linguistic features examined and the low-level processing methods used.
At the same time, language technology for dealing with modern languages has developed apace, with automatic language tools now achieving a degree of accuracy that has enabled both popular online services such as Google translate and the rapid accumulation of linguistically annotated monolingual and multilingual corpora for many languages. Much less has been done on historical texts: there is little commercial interest in these language varieties, there is often limited amounts of data (making purely data-driven annotation approaches unfeasible), and they are less well-behaved than modern print corpora, due to lack of standardization on all linguistic levels, starting with orthography. Digitized older texts also often suffer from OCR errors.
The basic premise of the workshop is that historical linguistics can benefit greatly from having access to historical and diachronic corpora with rich linguistic annotations, but this is a field where researchers have barely scratched the surface of what is possible. However, because of the nature of the material and of the research questions, interesting questions of theory and method arise in connection with this work, which often are relevant to work on modern data as well (e.g., linguistic variation in spoken language or in web genres). The workshop aims at providing a forum where these questions can be discussed. The target audience of the workshop are researchers - linguists and computational linguists - involved in the creation and utilization of richly annotated historical and diachronic text corpora, in the context of historical-comparative (diachronic, genetic) linguistic research.
|Linguistic Subfield:||Computational Linguistics; Historical Linguistics|
| This is a session of the following meeting:
|Calls and Conferences main page|