LINGUIST List 29.3686

Tue Sep 25 2018

Diss: Syntax; Text/Corpus Linguistics: William Dyer: ''Minimizing Integration Cost: A General Theory of Constituent Order''

Editor for this issue: Sarah Robinson <>

***************** LINGUIST List Support *****************

Fund Drive 2018
28 years of LINGUIST List!
Please support the LL editors and operation with a donation at:

Date: 25-Sep-2018
From: William Dyer <>
Subject: Minimizing Integration Cost: A General Theory of Constituent Order
E-mail this message to a friend

Institution: University of California, Davis
Program: Department of Linguistics
Dissertation Status: Completed
Degree Date: 2017

Author: William Dyer

Dissertation Title: Minimizing Integration Cost: A General Theory of Constituent Order

Dissertation URL:

Linguistic Field(s): Syntax
                            Text/Corpus Linguistics

Dissertation Director:
John A. Hawkins
Raul Aranovich
Santiago Barreda

Dissertation Abstract:

A major question in linguistics is why words and phrases—more generally, constituents—have preferred orders cross-linguistically, where ‘preferred’ covers both possible and probable. Why does ''the good Italian wine'' sound better than ''the Italian good wine,'' or ''good Italian the wine''? Disparate theories have been advanced to address different types of constituents, from stacked attributive adjectives to postverbal prepositional phrases, though many of these suffer from inaccuracy, lack of testability, or questionable explanatory power.

This dissertation aims to fill this gap by advancing a general theory of constituent ordering. Relying on a quantitative syntactic analysis within a dependency-grammar framework, the study begins with the notion that the cost of integrating dependents to their heads derives from two sources: the complexity of the integration and the distance between dependent and head (cf. Gibson, 1998; Gibson, 2000). While the distance-based part has been convincingly addressed in the literature (cf. H. Liu, Xu, and Liang, 2017, for an overview), the complexity part remains unresolved.

It is proposed that the complexity of the dependency relation can be measured by the entropy of the distribution created by the probabilities of each dependent’s heads. Aggregated across all constituents in a sequence, and thereby capturing the distance cost of the integration as well, this combined metric of Aggregate Complexity (AC) serves as a measure of processing difficulty—and therefore integration cost—for a given constituent order. It is hypothesized that the linearization which minimizes AC tends to represent the preferred order of the underlying dependency structure.

To test the hypothesis, corpus data from 51 languages from the Universal Dependencies project is analyzed in order to show that Aggregate Complexity Minimization (ACM) effectively explains cross-linguistic regularities of attested orders. The dissertation demonstrates how previous models can be subsumed by or extended into ACM, as well as how compression links constituent-order preferences with efficiency and ultimately more universal principles.

Page Updated: 25-Sep-2018