LINGUIST List 17.2660

Mon Sep 18 2006

FYI: Call for Collaboration: Latin Treebank

Editor for this issue: Hunter Lockwood <hunterlinguistlist.org>


Directory         1.    David Bamman, Call for Collaboration: Latin Treebank


Message 1: Call for Collaboration: Latin Treebank
Date: 18-Sep-2006
From: David Bamman <David.Bammantufts.edu>
Subject: Call for Collaboration: Latin Treebank


Call for Collaboration: Latin Treebank

The Perseus Project has recently received a planning grant from the NSF toinvestigate the costs and labor involved in constructing amultimillion-word Latin treebank (a large collection of syntacticallyparsed sentences), along with its potential value for the linguistics andClassics community. While our initial efforts under this grant will focuson syntactically annotating excerpts from Golden Age authors (Caesar,Cicero, Vergil) and the Vulgate, a future multimillion-word corpus would becomprised of writings from the pre-Classical period up through the EarlyModern era. To date we've annotated a total of 12,000 words in a stylethat's predominantly informed by two sources: the dependency grammar usedby the Prague Dependency Treebank (itself based on Mel'cuk 1988), and theLatin grammar of Pinkster 1990.

While treebanks provide valuable training data for computational tasks suchas grammar induction and automatic syntactic parsing, they also have thepotential to be used in traditional research areas as well. Largecollections of syntactically parsed sentences have the potential torevolutionize lexicography and philology, as they provide the immediatecontext for a word's use along with its typical syntactic arguments (thislets us chart, for example, how the meaning of a verb changes as itspredominant arguments change). Treebanks enable large-scale research intostructurally-based rhetorical devices particularly of interest toClassicists (such as hyperbaton) and they provide the raw data for researchin historical linguistics (such as the move in Latin from classical SOVword order to romance SVO).

The eventual Latin treebank will be openly available to the public; weshould, therefore, come to a consensus on how it should be built. To thatend we encourage input from the linguistics and Classics community on thetreebank design (including the syntactic representation of Latin) andwelcome contributions by annotators (for which limited funding isavailable). Interested collaborators should contact David Bamman(David.Bammantufts.edu) at the Perseus Project.

Linguistic Field(s): Historical Linguistics; Syntax; Text/Corpus Linguistics