LINGUIST List 29.3469

Mon Sep 10 2018

Books: From Lexical Functional Grammar to Enhanced Universal Dependencies: Patejuk, Przepiórkowski

Editor for this issue: Jeremy Coburn <>

***************** LINGUIST List Support *****************

Fund Drive 2018
28 years of LINGUIST List!
Please support the LL editors and operation with a donation at:

Date: 07-Sep-2018
From: Adam Przepiórkowski <>
Subject: From Lexical Functional Grammar to Enhanced Universal Dependencies: Patejuk, Przepiórkowski
E-mail this message to a friend

Title: From Lexical Functional Grammar to Enhanced Universal
Subtitle: Linguistically informed treebanks of Polish
Published: 2018
Publisher: Institute of Computer Science, Polish Academy of Sciences

Book URL:

Author: Agnieszka Patejuk
Author: Adam Przepiórkowski
Electronic: ISBN: 9788363159269 Pages: 263 Price: U.K. £ 0 Comment: Open Access (CC BY-NC-SA 4.0)

Syntactically annotated corpora, or ‘treebanks’, belong to the most heterogeneous kinds of linguistic resources. They differ not only in the general kind of approach they adopt (constituency or dependency), but also in the number of representation levels they assume (often one, but sometimes two or more) and in the extent to which they follow an established linguistic theory (if at all). Also, even within one kind of approach, the representation of a particular phenomenon may differ widely between treebanks.

In treebank development, there is a clear tension between theoretical accuracy within a treebank and utilitarian consistency between treebanks of the same or different languages. On the one hand, utterances should be annotated with linguistically accurate and precise descriptions, and one way to achieve this is by following a specific linguistic theory, one with a well-defined terminology, good formal background and a body of carefully justified analyses of many phenomena of typologically diverse languages. An example of such a theory is Lexical Functional Grammar (LFG). However, LFG is not the only theory of this kind, and even within one theory, similar phenomena may receive very different representations, reflecting different traditions or different weights assigned to pieces of evidence supporting one or another analysis. So this theoretically-oriented approach to treebank development inevitably leads to the creation of treebanks with very diverse annotation schemes, which are often comprehensible only to a limited number of followers of a given linguistic theory.

On the other hand, especially in the context of multilingual natural language processing (NLP), treebanks should ideally follow a common annotation scheme, one that is intelligible to a much broader group of treebank consumers than professional linguists working within a given theory. Moreover, similar phenomena and constructions should receive analogous representations, even if there are subtle – from the point of view of practical applications – differences suggesting dissimilar analyses. A recent attempt at such a comprehensive syntactic annotation scheme is Universal Dependencies (UD; As a practical solution, UD aims at providing a maximally simple syntactic representation, one that is useful for various NLP applications, even if at the cost of linguistic precision.

This monograph presents two treebanks of Polish which follow the two approaches, as well as the procedure of converting one to the other. Part I presents an LFG treebank, part II describes the procedure of converting this LFG structure bank to a UD treebank, and part III offers a stand-alone presentation of the resulting UD treebank of Polish.

Linguistic Field(s): Linguistic Theories
                            Text/Corpus Linguistics

Subject Language(s): Polish (pol)
Language Family(ies): West Slavic

Written In: English (eng)

See this book announcement on our website:

Page Updated: 10-Sep-2018