Review of  Language Complexity

Reviewer: Peter M. Arkadiev
Book Title: Language Complexity
Book Author: Matti Miestamo Kaius Sinnemäki Fred Karlsson
Publisher: John Benjamins
Linguistic Field(s): Sociolinguistics
Issue Number: 20.495

Discuss this Review
Help on Posting
EDITORS: Miestamo, Matti; Sinnemäki, Kaius; Karlsson, Fred
TITLE: Language Complexity
SUBTITLE: Typology, Contact, Change
SERIES TITLE: Studies in Language Companion Series 94
PUBLISHER: John Benjamins
YEAR: 2008

Peter M. Arkadiev, Institute of Slavic Studies, Russian Academy of Sciences, Moscow

The book under review is a collection of 16 papers from the conference
''Approaches to Complexity in Language'' held in Helsinki in August 2005. The
topic of language complexity has lately attracted much attention from linguists,
and several important contributions to the field have been published, e.g. Dahl
(2004), Hawkins (2004), McWhorter (2005, 2007). The contributions to the volume
discuss such topics as the ways of defining and measuring linguistic complexity,
the relations between the more abstract notion of complexity and the
user-oriented idea of greater difficulty of learning a language, the alleged
equal complexity of all languages and possible complexity trade-offs between
different levels of language, and the effects of language contact, especially of
pidginization and creolization, on linguistic complexity. Among the
contributions to the volume there are both more theory-oriented and more
data-oriented articles, which are grouped into three blocks called ''Typology and
theory'', ''Contact and change'' and ''Creoles and pidgins''.

The volume opens with an Introduction by the editors (''The problems of language
complexity'', pp. vii-xiv), which contains a brief introduction to the topic of
linguistic complexity and a useful summary of the contributions to the volume.

Part I ''Typology and theory''
Wouter Kusters (''Complexity in linguistic theory, language learning and language
change'', pp. 3-22) draws a distinction between 'absolutivist' and 'relativist'
approaches to linguistic complexity, and argues for the latter view, by which
complexity crucially hinges upon the notion of the difficulty a learner may
experience acquiring the language. Kusters defines complexity as ''the amount of
effort a generalized outsider has to make to become acquainted with the language
in question'' (p. 9), and identifies three dimensions of complexity in the domain
of inflectional morphology, viz. ''economy'' (the number of categories
expressible), ''transparency'' in the expression of the categories, and
''isomorphy'' of the semantico-syntactic and morphological facets of the
categories. Finally, Kusters applies his theory to verbal morphology of several
Quechuan varieties and hypothesizes that considerable differences between them
in the degree of complexity may be due to various sociolinguistic factors
operative in the history of these languages.

Matti Miestamo (''Grammatical complexity in a cross-linguistic perspective'', pp.
23-41) also addresses the issue of ''absolute'' vs. ''relative'' complexity, and
argues, contrary to Kusters, that ''complexity should be approached from the
absolute point of view in cross-linguistic studies'' (p. 28). Another important
question raised by Miestamo concerns the problems with assessing the ''global''
complexity of a language. He argues that given the fact that the metrics of
global complexity cannot be fully representative of the linguistic data, and
that their outcomes cannot be fully comparable because different criteria may
give conflicting results, ''the cross-linguistic study of grammatical complexity
should primarily focus on specific areas of grammar, i.e., on local complexity''
(p. 31).

Gertraud Fenk-Oczlon and August Fenk (''Complexity trade-offs between the
subsystems of language'', pp. 43-65) investigate possible correlations between
complexity values in different domains of language, viz. phonology, morphology
and syntax. Drawing upon their earlier work (Fenk and Fenk-Oczlon 1993), where
they have proposed that, e.g., an inverse correlation exists between the number
of syllables per word and the number of words per clause, they test the
following hypotheses (p. 49):

I. The higher a language's number of syllable types, the higher its number of
monosyllabic words.
II. The higher a language's syllable complexity, the higher its number of
syllable types.
III. The higher a language's syllable complexity, the higher its number of
monosyllabic words.
IV. The bigger a language's phonemic inventory, the higher its syllable complexity.

These hypotheses are tested on the data of several European languages coming
from Menzerath 1954, and are shown to hold. The authors further hypothesize that
high phonological complexity may result in high syntactic and semantic
complexity, i.e. in a rigid word order and a propensity of idiomatic
expressions, though they add a caveat that rigid word order may in itself be an
indicator of low rather than high syntactic complexity. The main conclusion of
the article is that though there exist significant complexity trade-offs between
different areas of language and there is no equal overall complexity in natural

Kaius Sinnemäki (''Complexity trade-offs in core argument marking'', pp. 67-88)
presents the results of a typological study of the functional load of different
strategies of core argument encoding, viz. dependent marking, head marking and
word order, based on a stratified sample of 50 languages. He discovers
significant negative correlations between the functional loads of dependent
marking and word order, but shows that there is no correlation between these
parameters and the functional load of head marking. However, Sinnemäki shows
that the combined functional load of word order and dependent marking inversely
correlates with the occurrence of the cross-referencing of the Patient argument
on the verb. Sinnemäki concludes that though the complexity ''trade-offs as an
all-encompassing principle in languages'' (p. 84) must be rejected, local
trade-offs between different encoding strategies in a single subdomain of
grammar do exist.

Patrick Juola (''Assessing linguistic complexity'', pp. 89-108) pursues an
information-theoretic approach to language complexity, based on the assumption
that complexity is, in fact, redundant for the communicative function of
language: ''a language is 'complex' if sending the message in that language
requires much more bandwidth than the information content of the message'' (p.
91). Juola compares several complexity metrics, such as Kolmogorov complexity
(Li and Vitányi 1997), linear complexity (Massey 1969), and Ziv-Lempel
complexity (Lempel and Ziv 1976), and argues that the latter is most adequate to
the linguistic data. Juola conducts an experiment using compression of text
files with the Ziv-Lempel algorithm, and shows (i) that the original text is
significantly shorter than its translation into various languages (the Bible and
several other texts are used to test this hypothesis); (ii) that distorting the
original text in various systematic ways, so that distortions apply at the
levels of phonology, morphology, syntax, and pragmatics, allows one to measure
the complexity of the respective domains.

David Gil (''How complex are isolating languages?'', pp. 109-131) casts doubt on
the position held by many linguists that all languages show an equal overall
degree of complexity, which implies that languages ''compensate for lesser
complexity in one area with greater complexity in another'' (p. 110). In
particular, Gil argues that some isolating languages do not ''compensate'' for the
lack of morphology by complexity of their syntax and syntax-semantics interface,
but on the contrary show lesser degree of semantic complexity. Gil investigates
to which degree languages possess what he calls ''associational semantics'', i.e.
the following relation between the form of complex expression and their meanings
(p. 113):

The Association Operator A:
Given a set of meanings M1...Mn, the Association Operator A derives a meaning
A(M1...Mn), or 'entity associated with M1 and ... and Mn'

In its pure form, association semantics may be observed in the language of
pictograms, but, as Gil argues, the Association Operator in fact constitutes the
basis of the semantics of natural language. Though in many languages it is
supplied by more specific rules of compositional semantics referring to various
morphosyntactically encoded features, Gil claims it is possible to observe it in
some isolating languages. Gil presents the results of an experiment conducted
with the speakers of several languages, both non-isolating (English and Hebrew)
and isolating, including creoles and languages form West Africa, Southeast Asia
and Western Indonesia. It turns out that the speakers of at least some isolating
languages allow associational interpretations of sentences to a significantly
higher degree than the speakers of the non-isolating ones. Gil concludes that
''isolating languages are actually simpler than their non-isolating counterparts
with respect to their compositional semantics'' (p. 129).

In contrast to Gil, Elizabeth M. Riddle (''Complexity in isolating languages:
Lexical elaboration versus grammatical economy'', pp. 133-151) supports the more
traditional view that isolating languages are just as grammatically complex as
synthetic languages, the difference lying in the domain where complexity
resides. Basing on evidence from Hmong, Mandarin, and Thai, Riddle shows that in
these languages complexity is to be found in the more or less grammaticalized
features of the lexicon, such as rich classifier systems, abundant verb
serialization, productive compounding, and existence of special conventionalized
elaborate expressions, such as Hmong _cua daj cua dub_ 'storm' (lit. ''wind
yellow wind black'', p. 144). Riddle argues that the aforementioned properties of
isolating languages make them just as difficult to master for a second language
learner as the more synthetic languages.

Östen Dahl (''Grammatical resources and linguistic complexity. Sirionó as a
language without NP coordination'', pp. 153-164) presents a case-study based on a
corpus of texts in Sirionó, a Tupí-Guaraní language and shows that in this
language there is no grammatical device similar to noun phrase coordination, and
that in order to express the relevant meanings this language uses other, less
grammaticalized strategies, ultimately boiling down to adding a new separate
unit to the discourse. Dahl concludes that this property of Sirionó may be
considered as evidence of lesser structural complexity.

II. Contact and change.
John McWhorter (''Why does a language undress? Strange cases in Indonesia'', pp.
167-190) proposes that the only possible cause for a language to substantially
reduce its overall complexity is widespread non-native acquisition in a contact
situation. More specifically, McWhorter claims that ''this is true not only in
the extreme case of creoles, but to a lesser but robust extent in many languages
of the world'' (p. 169), such as, for instance, English (McWhorter 2002).
McWhorter presents several case-studies from Indonesia and shows that the
specific degree of complexity reduction observed in such languages as Riau
Indonesian (Sumatra) and Tetun Terik (Timor) may be explained by documented or
highly probable non-native acquisition during their recent past. Turning to Keo,
Rongga, and Ngadha (Flores), which, though not as ''stripped'' as Riau Indonesian,
show a much higher degree of analyticity in comparison to other Indonesian
languages, McWhorter suggests that despite the lack of any evidence of contact
situations leading to non-native acquisition in this area, such historic events
may be inferred on the basis of linguistic data, since, in his view, there is no
other plausible explanation for the loss of synthetic morphology in the Flores

Casper de Groot (''Morphological complexity as a parameter of linguistic
typology: Hungarian as a contact language'', pp. 191-215) discusses the
differences in morphological makeup between the Hungarian varieties outside
Hungary and standard Hungarian. De Groot shows that Hungarian outside Hungary
displays a greater degree of analyticity in several grammatical domains, such as
expression of modality, reflexivity, causativity, and tends to replace compounds
and complex derivatives by phrasal expressions. Though this may be an indicator
of a decrease in system complexity, de Groot argues that in fact the analytical
mode of expression gives rise to a type of complexity different from that
observed in the more synthetic varieties of Hungarian.

Eva Lindström (''Language complexity and interlinguistic difficulty'' pp. 217-242)
explores the relation between language complexity and the difficulty a learner
may experience during its acquisition, basing her study on the non-Austronesian
language Kuot and its three Austronesian neighbors spoken in New Ireland (Papua
New Guinea). Lindström measures complexity as the number of choices a learner
has to make in order to produce a grammatical sentence in a language. She shows
that though all the languages in question possess various categories
obligatorily or optionally expressed in the clause, Kuot is justly considered to
be extremely difficult by the Austronesian speakers because of a much greater
number of morphosyntactic features and greater morphological elaboration (e.g.,
Kuot has gender and object marking on the verb sensitive to more or less
arbitrary verb class, all of which is lacking in its Austronesian neighbours).
Lindström argues that learner difficulty may increase due to such factors as (i)
mismatches between the native and the second language in the organization of
similar categories, (ii) various co-occurrence restrictions on the expression of
categories, (iii) non-transparency of morphological expressions, and (iv)
elaboration of the lexicon.

Antje Dammel and Sebastian Kürschner (''Complexity in nominal plural allomorphy:
A contrastive survey of ten Germanic languages'', pp. 243-262) investigate
different factors affecting complexity in the domain of expression of nominal
number in Germanic languages. The following qualitative criteria are employed to
measure complexity: (1) number of allomorphs; (2) degree of stem involvement
(from umlaut as the sole exponence of number to purely phonetic sandhi); (3)
redundant marking (stem alternation combined with an overt affix); (4) zero and
subtractive marking; (5) fusion of number and case; (6) degree of phonetic or
semantic motivation of allomorph choice; (7) amount of regularity of formal
techniques. The authors show that these parameters mainly correlate with each
other and allow one to represent the Germanic languages as a cline from the
least complex system of plural marking found in English to the most complex
systems (Icelandic and Faroese), with other languages being situated either
closer to the simpler pole (Afrikaans, Dutch, Frisian, Danish) or to the complex
pole (Swedish, German and Luxembourgish). Some data on the possible burden
different features may impose on the language learner are also discussed.

III. Creoles and pidgins
Mikael Parkvall (''The simplicity of creoles in a cross-linguistic perspective'',
pp. 265-285) argues that it is necessary to carefully distinguish between such
properties of language as expressiveness (its ability ''to encode human
experience'', p. 265) and (structural) complexity, and points out that while it
is not to be doubted that all languages are equally expressive, this by no means
implies that they must be equally complex. Parkvall proposes a method to measure
linguistic complexity by selecting ca. 50 features from phonology, morphology
and syntax and counting their values as presented in Dryer et al. (eds.) 2005.
Notably, Parkvall excludes the parameter of syntheticity vs. analyticity from
his complexity metric, since, in his view, synthetic and analytic expressions of
the same grammatical feature do not differ in complexity per se, and also
because inclusion of this parameter would considerably bias the results of the
measurement. Parkvall's counts show that creoles and pidgins are indeed the
least complex among the languages of the world, sharing a special typological

Harald Hammarström (''Complexity in numeral systems with an investigation into
pidgins and creoles'', p. 286-304) discusses several parameters relevant to
complexity in numeral systems (transparency of formation and (ir)regularity) and
investigates the values of these parameters in different pidgin and creole
languages. The study shows that, though ''pidgins/creoles have slightly less
complex numerals relative to their lexifiers'', their numeral systems
nevertheless ''are on the average more complex than the world average'', which may
be ''easily explained by the fact that well-documented pidgins/creoles have a set
of lexifiers which is non-representative of the (documented) languages of the
world as a whole'' (p. 300).

Angela Bartens and Niclas Sandström (''Explaining Kabuverdianu nominal plural
formation'', pp. 305-320) discuss the patterns of plural marking in the
Portuguese based Cape Verdean Creole (Kabuverdianu) in the framework of the 4-M
theory (Myers-Scotton 1993), which distinguishes between four types of morphemes
basing on their semantic content and grammatical function. The authors show that
in Kabuverdianu plural marking is usually restricted to the first constituent of
a noun phrase and is optional in the sense that it usually does not appear in
those cases when the necessary interpretation may be inferred from the context.
The authors conclude that though Standard Portuguese plural marking is in itself
not very complex, Kabuverdianu ''stands very close to the lower end of the
complexity scale'' (p. 318).

Päivi Juvonen (''Complexity and simplicity in minimal lexica. The lexicon of
Chinook Jargon'', p. 321-340) examines the lexicon of a Pacific Northwest pidgin
called Chinook Jargon, as used in a contemporary fiction text. He shows that
despite the very restricted size of its lexicon, Chinook Jargon achieves high
expressive efficiency by using multiword constructions involving
semi-grammaticalized lexical items (e.g. the 'verbalizer' _mamook_), and by
allowing a much greater degree of multifunctionality of lexical items, than,
e.g. English. Juvonen concludes that the lexicon of Chinook Jargon ''can indeed
be said to be simple from an information-theoretic point of view'' (p. 337).

This volume is undoubtedly a very interesting and valuable collection of papers
on an important and widely discussed topic in current linguistics. The
contributions to the volume cover a large variety of questions pertaining to
language complexity and use impressive cross-linguistic data. Here I would like
to pin down those of the ideas stated in the book which, in my opinion, deserve
special attention.

First of all, one of the most important points made in several contributions to
the book (e.g. Miestamo, Sinnemäki, Dahl) is that it is necessary to distinguish
between the 'global' complexity of a language and 'local' complexity of a
particular subsystem or category. Moreover, as Miestamo argues, the task of
assessing 'global' complexity faces several problems which make it very hard to
achieve a cross-linguistically valid, representative and unbiased metric of
'global' complexity. Thus, studying complexity of individual grammatical or
lexical domains in and across languages seems much more promising and fruitful
than trying to develop holistic scales of complexity or to show that all
languages are equally complex. Nevertheless, among the several attempts to
construe a typology of 'global' complexity of languages presented in this
volume, I'd like to point to that proposed by Mikael Parkvall. This approach
seems promising because of its predominant functional orientation: it is based
not on the way languages encode things (cf. explicit exclusion of synthesis from
the factors adding to complexity), but on the array of things they encode (such
as grammatical categories, special semantic distinctions etc.).

Another important point made in the book pertains to the role of language
contact in language complexity. This issue is approached from different points
of view: John McWhorter makes a very strong claim concerning the role of
non-native acquisition in language ''simplification''; Casper de Groot argues on
the example of Hungarian that language contacts may induce a decrease of
morphological complexity with a possible concomitant increase in the syntactic
complexity; Eva Lindström shows that the complexity of a language may play a
role in the degree to which it is used as a second language in polyethnic
societies. Studies of 'local' complexity in contact situations seem very
promising, as well as the recognition of the fact that more abstract 'absolute'
structural complexity must be distinguished from the 'relative' difficulty of
particular aspects of the language for a non-native acquirer.

The third point I'd like to make is that the volume has clearly shown that at
the current state of linguistics it is premature to try to seriously asses the
'relative' complexity of a language, i.e. the difficulty which may be
experienced during its acquisition as a second language. A telling example of
this is Kusters's contribution, who, though explicitly rejecting the notion of
'absolute' complexity in favor of the 'relative' complexity, defines the latter
with respect to a ''generalized outsider'', a character too abstract to be
considered as a plausible model of a language learner. As Lindström writes,
''difficulty ... depends on the individual we take as our starting point. If I am
Swedish and learning Estonian, it is very difficult as the two languages are
very different; if I am Finnish it is a whole lot easier as many words and
structures are closely related, quite independently on the complexity of the
systems involved'' (p. 221). Similarly, for a native speaker of Adyghe,
undoubtedly one of the most complex languages of the world by whatever metric,
its closest relative Kabardian is obviously much easier to acquire than such a
relatively less complex language as English. Thus, one might think that
'difficulty' has much less to do with complexity than with, for instance, the
degree of similarity of the target language to the native tongue of the learner.
And, again, just as complexity is not a holistic property of a language but is
rather differentially localized in its various subsystems, so different 'parts'
of a language may be more or less 'difficult' for a learner.

The overall impression from the volume is that the linguists working on the
topic of language complexity have little agreement concerning both the general
notions such as the definition of complexity and ways to measure it, and more
particular details such as whether syntheticity adds to complexity or not. Some
authors explicitly and sometimes even passionately (e.g. Parkvall) argue against
the idea that all languages are of equal complexity and, specifically, against
the idea of 'compensation' and trade-off. Others (e.g. Riddle) defend this idea,
and give quite compelling arguments that lack of complexity in one domain (e.g.
morphology) may be 'compensated' for by complexity in another domain (e.g.
lexicon). Positions differ as to whether analytical languages are less complex
than the synthetic ones (Gil) or whether they are just as complex but in a
different respect (Riddle). Such discrepancies between different authors may
certainly be attributed not only to differing ideological or methodological
positions, but also to differences in the languages they base their analyses on.
Moreover, Fenk and Fenk-Oczlon are even not sure whether rigid word order is a
sign of greater or of lesser complexity, and provide arguments for both
positions. All this shows only that the topic of linguistic complexity is a
very promising one, and that further research in this domain should be encouraged.

Instead of a conclusion, I would like to make a critical remark, which, minor
though it may seem, I consider to reveal a rather alarming condition of current
typological research. My remark concerns errors in the data. In Parkvall's
article, on p. 266 ex. (3a) contains a short sentence in Adyghe; though the
transcription is correct, the glosses are a complete mess. In Hammarström's
article, on p. 290 Table 1 contains ten Russian numerals, out of which four are
incorrect. If the only examples from two languages I am familiar with in the
whole volume contain errors, how could I be expected to rely on the correctness
of the examples from the languages I don't know? Unfortunately, the situations
when data are misprinted, misglossed, mistranslated, and misinterpreted are not
at all rare in the typological literature, huge and widely used databases such
as Dryer et al. (eds.) 2005 not being an exception, and this inevitably leads to
the decrease of reliability of the whole field of typology, since conclusions
drawn from erroneous data cannot be fully correct.

Dahl, Östen (2004). _The Growth and Maintenance of Linguistic Complexity_.
Amsterdam: Benjamins.

Dryer, Matthew, Bernard Comrie, Martin Haspelmath & David Gil (eds.) (2005).
_World Atlas of Language Structure_. Oxford: Oxford University Press.

Fenk, August & Gertraud Fenk-Oczlon (1993). Menzerath's Law and the constant
flow of linguistic information. In _Contributions to Quantitative Linguistics_,
ed. by R. Köhler & B. Rieger, 11-31. Dordrecht: Kluwer.

Hawkins, John (2004). _Efficiency and Complexity in Grammars_. Oxford: Oxford
University Press.

Lempel, Abraham & Jakob Ziv (1976). On the complexity of finite sequences. In:
_IEEE Transactions in Information Theory_ IT-22 (1), 75-81.

Li, Ming & Paul Vitányi (1997). _An Introduction to Kolmogorov Complexity and
Its Applications_. 2nd ed. New York: Springer.

Massey, James L. (1969). _Shift-register synthesis and BCH decoding. In: IEEE
Transactions in Information Theory_ IT-15(1), 122-127.

Menzerath, Paul (1954). _Die Architektonik des deutschen Wortschatzes_.
Hannover: Dümmler.

McWhorter, John (2002). What happened to English? _Diachronica_ 19, 217-272.

McWhorter, John (2005). _Defining Creole_. Oxford: Oxford University Press.

McWhorter, John (2007). _Language Interrupted: Signs of Non-native Acquisition
in Standard Language Grammars_. New York: Oxford University Press.

Myers-Scotton, Carol (1993). _Duelling Languages. Grammatical Structure in
Codeswitching_. Oxford: Clarendon Press.

Peter M. Arkadiev, PhD in linguistics (2006), is a research fellow at the
Department of typology and comparative linguistics of the Institute of Slavic
studies of the Russian Academy of Sciences, Moscow. His main interests are
linguistic typology with a focus on event and argument structure and its formal
realization, tense-aspect-modality and case marking. He works mainly on
Lithuanian and Adyghe.