The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.
|Full Title:||Annotation of Corpora for Research in the Humanities|
|Start Date:||29-Nov-2012 - 29-Nov-2012|
|Meeting Email:||click here to access email|
|Meeting Description:||The second edition of the workshop on ‘Annotation of Corpora for Research in the Humanities’ (ACRH-2) will be held on November 29, 2012 at the University of Lisbon (Portugal) (http://alfclul.clul.ul.pt/crpc/acrh2/index.html).
The workshop will be co-located with the 11th International Workshop on Treebanks and Linguistic Theories (TLT-11), which will be held on November 30 - December 1, 2012 (http://tlt11.clul.ul.pt/).
Like in its first edition (held in Heidelberg on 5 January, 2012: proceedings available here: http://www.jlcl.org/index.php?modus=aktuelle_ausgabe&language=en), the ACRH workshop aims at building a tighter collaboration between people working in various areas of the Humanities (such as literature, philology, history etc.) and the research community involved in developing, using and making accessible annotated corpora.
Addressing topics related to annotated corpora for research in the Humanities is an interdisciplinary task, which involves corpus and computational linguists (mostly those working in literary computing), philologists, scholars in the Humanities and computer scientists. However, this interdisciplinarity is not fully realised yet. Indeed, philologists and scholars are not used to exploit NLP tools and language resources such as annotated corpora; in turn, computational linguists are more prone to develop language resources for NLP purposes only.
For instance, although many corpora that play a relevant role for research in Humanities are today available in digital format (theatrical plays, contemporary novels, critical literature, literary reviews etc.), only a few of them are linguistically tagged, while most still lack linguistic tagging at all. Historical corpora are also a case of special interest, since their creation demands a strong interplay between computational linguistics and more traditional scholarship. Over the past few years a number of historical annotated corpora have been started, among which are treebanks for Middle, Early Modern and Old English, Early New High German, Medieval Portuguese, Ugaritic, Latin, Ancient Greek and several translations of the New Testament into Indo-European languages. The experience of these ever-growing groups of projects can provide many suggestions on the methodology as well as on the practice of interaction between literary studies, philology and corpus linguistics.
We believe that a tighter collaboration between people working in the Humanities and the research community involved in developing annotated corpora is now needed because, while annotating a corpus from scratch still remains a labor-intensive and time-consuming task, today this is simplified by intensively exploiting prior experience in the field. Indeed, such a collaboration is still quite far from being achieved, as a gap still holds between computational linguists (who sometimes do not involve humanists in developing and exploiting annotated corpora for the Humanities) and humanists (who sometimes just ignore that such corpora do exist and that automatic methods and standards to build them are today available).
Martin Wynne (University of Oxford, UK)
|Linguistic Subfield:||Ling & Literature; Text/Corpus Linguistics; Computational Linguistics|
|Calls and Conferences main page|