From: Peter Arkadiev <alpgurevgmail.com>
Subject: Language Complexity
E-mail this message to a friend
Discuss this message
Announced at http://linguistlist.org/issues/19/19-936.html
EDITORS: Miestamo, Matti; Sinnemäki, Kaius; Karlsson, FredTITLE: Language ComplexitySUBTITLE: Typology, Contact, ChangeSERIES TITLE: Studies in Language Companion Series 94PUBLISHER: John BenjaminsYEAR: 2008
Peter M. Arkadiev, Institute of Slavic Studies, Russian Academy of Sciences, Moscow
INTRODUCTIONThe book under review is a collection of 16 papers from the conference''Approaches to Complexity in Language'' held in Helsinki in August 2005. Thetopic of language complexity has lately attracted much attention from linguists,and several important contributions to the field have been published, e.g. Dahl(2004), Hawkins (2004), McWhorter (2005, 2007). The contributions to the volumediscuss such topics as the ways of defining and measuring linguistic complexity,the relations between the more abstract notion of complexity and theuser-oriented idea of greater difficulty of learning a language, the allegedequal complexity of all languages and possible complexity trade-offs betweendifferent levels of language, and the effects of language contact, especially ofpidginization and creolization, on linguistic complexity. Among thecontributions to the volume there are both more theory-oriented and moredata-oriented articles, which are grouped into three blocks called ''Typology andtheory'', ''Contact and change'' and ''Creoles and pidgins''.
SUMMARYThe volume opens with an Introduction by the editors (''The problems of languagecomplexity'', pp. vii-xiv), which contains a brief introduction to the topic oflinguistic complexity and a useful summary of the contributions to the volume.
Part I ''Typology and theory''Wouter Kusters (''Complexity in linguistic theory, language learning and languagechange'', pp. 3-22) draws a distinction between 'absolutivist' and 'relativist'approaches to linguistic complexity, and argues for the latter view, by whichcomplexity crucially hinges upon the notion of the difficulty a learner mayexperience acquiring the language. Kusters defines complexity as ''the amount ofeffort a generalized outsider has to make to become acquainted with the languagein question'' (p. 9), and identifies three dimensions of complexity in the domainof inflectional morphology, viz. ''economy'' (the number of categoriesexpressible), ''transparency'' in the expression of the categories, and''isomorphy'' of the semantico-syntactic and morphological facets of thecategories. Finally, Kusters applies his theory to verbal morphology of severalQuechuan varieties and hypothesizes that considerable differences between themin the degree of complexity may be due to various sociolinguistic factorsoperative in the history of these languages.
Matti Miestamo (''Grammatical complexity in a cross-linguistic perspective'', pp.23-41) also addresses the issue of ''absolute'' vs. ''relative'' complexity, andargues, contrary to Kusters, that ''complexity should be approached from theabsolute point of view in cross-linguistic studies'' (p. 28). Another importantquestion raised by Miestamo concerns the problems with assessing the ''global''complexity of a language. He argues that given the fact that the metrics ofglobal complexity cannot be fully representative of the linguistic data, andthat their outcomes cannot be fully comparable because different criteria maygive conflicting results, ''the cross-linguistic study of grammatical complexityshould primarily focus on specific areas of grammar, i.e., on local complexity''(p. 31).
Gertraud Fenk-Oczlon and August Fenk (''Complexity trade-offs between thesubsystems of language'', pp. 43-65) investigate possible correlations betweencomplexity values in different domains of language, viz. phonology, morphologyand syntax. Drawing upon their earlier work (Fenk and Fenk-Oczlon 1993), wherethey have proposed that, e.g., an inverse correlation exists between the numberof syllables per word and the number of words per clause, they test thefollowing hypotheses (p. 49):
I. The higher a language's number of syllable types, the higher its number ofmonosyllabic words.II. The higher a language's syllable complexity, the higher its number ofsyllable types.III. The higher a language's syllable complexity, the higher its number ofmonosyllabic words.IV. The bigger a language's phonemic inventory, the higher its syllable complexity.
These hypotheses are tested on the data of several European languages comingfrom Menzerath 1954, and are shown to hold. The authors further hypothesize thathigh phonological complexity may result in high syntactic and semanticcomplexity, i.e. in a rigid word order and a propensity of idiomaticexpressions, though they add a caveat that rigid word order may in itself be anindicator of low rather than high syntactic complexity. The main conclusion ofthe article is that though there exist significant complexity trade-offs betweendifferent areas of language and there is no equal overall complexity in naturallanguages.
Kaius Sinnemäki (''Complexity trade-offs in core argument marking'', pp. 67-88)presents the results of a typological study of the functional load of differentstrategies of core argument encoding, viz. dependent marking, head marking andword order, based on a stratified sample of 50 languages. He discoverssignificant negative correlations between the functional loads of dependentmarking and word order, but shows that there is no correlation between theseparameters and the functional load of head marking. However, Sinnemäki showsthat the combined functional load of word order and dependent marking inverselycorrelates with the occurrence of the cross-referencing of the Patient argumenton the verb. Sinnemäki concludes that though the complexity ''trade-offs as anall-encompassing principle in languages'' (p. 84) must be rejected, localtrade-offs between different encoding strategies in a single subdomain ofgrammar do exist.
Patrick Juola (''Assessing linguistic complexity'', pp. 89-108) pursues aninformation-theoretic approach to language complexity, based on the assumptionthat complexity is, in fact, redundant for the communicative function oflanguage: ''a language is 'complex' if sending the message in that languagerequires much more bandwidth than the information content of the message'' (p.91). Juola compares several complexity metrics, such as Kolmogorov complexity(Li and Vitányi 1997), linear complexity (Massey 1969), and Ziv-Lempelcomplexity (Lempel and Ziv 1976), and argues that the latter is most adequate tothe linguistic data. Juola conducts an experiment using compression of textfiles with the Ziv-Lempel algorithm, and shows (i) that the original text issignificantly shorter than its translation into various languages (the Bible andseveral other texts are used to test this hypothesis); (ii) that distorting theoriginal text in various systematic ways, so that distortions apply at thelevels of phonology, morphology, syntax, and pragmatics, allows one to measurethe complexity of the respective domains.
David Gil (''How complex are isolating languages?'', pp. 109-131) casts doubt onthe position held by many linguists that all languages show an equal overalldegree of complexity, which implies that languages ''compensate for lessercomplexity in one area with greater complexity in another'' (p. 110). Inparticular, Gil argues that some isolating languages do not ''compensate'' for thelack of morphology by complexity of their syntax and syntax-semantics interface,but on the contrary show lesser degree of semantic complexity. Gil investigatesto which degree languages possess what he calls ''associational semantics'', i.e.the following relation between the form of complex expression and their meanings(p. 113):
The Association Operator A:Given a set of meanings M1...Mn, the Association Operator A derives a meaningA(M1...Mn), or 'entity associated with M1 and ... and Mn'
In its pure form, association semantics may be observed in the language ofpictograms, but, as Gil argues, the Association Operator in fact constitutes thebasis of the semantics of natural language. Though in many languages it issupplied by more specific rules of compositional semantics referring to variousmorphosyntactically encoded features, Gil claims it is possible to observe it insome isolating languages. Gil presents the results of an experiment conductedwith the speakers of several languages, both non-isolating (English and Hebrew)and isolating, including creoles and languages form West Africa, Southeast Asiaand Western Indonesia. It turns out that the speakers of at least some isolatinglanguages allow associational interpretations of sentences to a significantlyhigher degree than the speakers of the non-isolating ones. Gil concludes that''isolating languages are actually simpler than their non-isolating counterpartswith respect to their compositional semantics'' (p. 129).
In contrast to Gil, Elizabeth M. Riddle (''Complexity in isolating languages:Lexical elaboration versus grammatical economy'', pp. 133-151) supports the moretraditional view that isolating languages are just as grammatically complex assynthetic languages, the difference lying in the domain where complexityresides. Basing on evidence from Hmong, Mandarin, and Thai, Riddle shows that inthese languages complexity is to be found in the more or less grammaticalizedfeatures of the lexicon, such as rich classifier systems, abundant verbserialization, productive compounding, and existence of special conventionalizedelaborate expressions, such as Hmong _cua daj cua dub_ 'storm' (lit. ''windyellow wind black'', p. 144). Riddle argues that the aforementioned properties ofisolating languages make them just as difficult to master for a second languagelearner as the more synthetic languages.
Östen Dahl (''Grammatical resources and linguistic complexity. Sirionó as alanguage without NP coordination'', pp. 153-164) presents a case-study based on acorpus of texts in Sirionó, a Tupí-Guaraní language and shows that in thislanguage there is no grammatical device similar to noun phrase coordination, andthat in order to express the relevant meanings this language uses other, lessgrammaticalized strategies, ultimately boiling down to adding a new separateunit to the discourse. Dahl concludes that this property of Sirionó may beconsidered as evidence of lesser structural complexity.
II. Contact and change.John McWhorter (''Why does a language undress? Strange cases in Indonesia'', pp.167-190) proposes that the only possible cause for a language to substantiallyreduce its overall complexity is widespread non-native acquisition in a contactsituation. More specifically, McWhorter claims that ''this is true not only inthe extreme case of creoles, but to a lesser but robust extent in many languagesof the world'' (p. 169), such as, for instance, English (McWhorter 2002).McWhorter presents several case-studies from Indonesia and shows that thespecific degree of complexity reduction observed in such languages as RiauIndonesian (Sumatra) and Tetun Terik (Timor) may be explained by documented orhighly probable non-native acquisition during their recent past. Turning to Keo,Rongga, and Ngadha (Flores), which, though not as ''stripped'' as Riau Indonesian,show a much higher degree of analyticity in comparison to other Indonesianlanguages, McWhorter suggests that despite the lack of any evidence of contactsituations leading to non-native acquisition in this area, such historic eventsmay be inferred on the basis of linguistic data, since, in his view, there is noother plausible explanation for the loss of synthetic morphology in the Floreslanguages.
Casper de Groot (''Morphological complexity as a parameter of linguistictypology: Hungarian as a contact language'', pp. 191-215) discusses thedifferences in morphological makeup between the Hungarian varieties outsideHungary and standard Hungarian. De Groot shows that Hungarian outside Hungarydisplays a greater degree of analyticity in several grammatical domains, such asexpression of modality, reflexivity, causativity, and tends to replace compoundsand complex derivatives by phrasal expressions. Though this may be an indicatorof a decrease in system complexity, de Groot argues that in fact the analyticalmode of expression gives rise to a type of complexity different from thatobserved in the more synthetic varieties of Hungarian.
Eva Lindström (''Language complexity and interlinguistic difficulty'' pp. 217-242)explores the relation between language complexity and the difficulty a learnermay experience during its acquisition, basing her study on the non-Austronesianlanguage Kuot and its three Austronesian neighbors spoken in New Ireland (PapuaNew Guinea). Lindström measures complexity as the number of choices a learnerhas to make in order to produce a grammatical sentence in a language. She showsthat though all the languages in question possess various categoriesobligatorily or optionally expressed in the clause, Kuot is justly considered tobe extremely difficult by the Austronesian speakers because of a much greaternumber of morphosyntactic features and greater morphological elaboration (e.g.,Kuot has gender and object marking on the verb sensitive to more or lessarbitrary verb class, all of which is lacking in its Austronesian neighbours).Lindström argues that learner difficulty may increase due to such factors as (i)mismatches between the native and the second language in the organization ofsimilar categories, (ii) various co-occurrence restrictions on the expression ofcategories, (iii) non-transparency of morphological expressions, and (iv)elaboration of the lexicon.
Antje Dammel and Sebastian Kürschner (''Complexity in nominal plural allomorphy:A contrastive survey of ten Germanic languages'', pp. 243-262) investigatedifferent factors affecting complexity in the domain of expression of nominalnumber in Germanic languages. The following qualitative criteria are employed tomeasure complexity: (1) number of allomorphs; (2) degree of stem involvement(from umlaut as the sole exponence of number to purely phonetic sandhi); (3)redundant marking (stem alternation combined with an overt affix); (4) zero andsubtractive marking; (5) fusion of number and case; (6) degree of phonetic orsemantic motivation of allomorph choice; (7) amount of regularity of formaltechniques. The authors show that these parameters mainly correlate with eachother and allow one to represent the Germanic languages as a cline from theleast complex system of plural marking found in English to the most complexsystems (Icelandic and Faroese), with other languages being situated eithercloser to the simpler pole (Afrikaans, Dutch, Frisian, Danish) or to the complexpole (Swedish, German and Luxembourgish). Some data on the possible burdendifferent features may impose on the language learner are also discussed.
III. Creoles and pidginsMikael Parkvall (''The simplicity of creoles in a cross-linguistic perspective'',pp. 265-285) argues that it is necessary to carefully distinguish between suchproperties of language as expressiveness (its ability ''to encode humanexperience'', p. 265) and (structural) complexity, and points out that while itis not to be doubted that all languages are equally expressive, this by no meansimplies that they must be equally complex. Parkvall proposes a method to measurelinguistic complexity by selecting ca. 50 features from phonology, morphologyand syntax and counting their values as presented in Dryer et al. (eds.) 2005.Notably, Parkvall excludes the parameter of syntheticity vs. analyticity fromhis complexity metric, since, in his view, synthetic and analytic expressions ofthe same grammatical feature do not differ in complexity per se, and alsobecause inclusion of this parameter would considerably bias the results of themeasurement. Parkvall's counts show that creoles and pidgins are indeed theleast complex among the languages of the world, sharing a special typologicalprofile.
Harald Hammarström (''Complexity in numeral systems with an investigation intopidgins and creoles'', p. 286-304) discusses several parameters relevant tocomplexity in numeral systems (transparency of formation and (ir)regularity) andinvestigates the values of these parameters in different pidgin and creolelanguages. The study shows that, though ''pidgins/creoles have slightly lesscomplex numerals relative to their lexifiers'', their numeral systemsnevertheless ''are on the average more complex than the world average'', which maybe ''easily explained by the fact that well-documented pidgins/creoles have a setof lexifiers which is non-representative of the (documented) languages of theworld as a whole'' (p. 300).
Angela Bartens and Niclas Sandström (''Explaining Kabuverdianu nominal pluralformation'', pp. 305-320) discuss the patterns of plural marking in thePortuguese based Cape Verdean Creole (Kabuverdianu) in the framework of the 4-Mtheory (Myers-Scotton 1993), which distinguishes between four types of morphemesbasing on their semantic content and grammatical function. The authors show thatin Kabuverdianu plural marking is usually restricted to the first constituent ofa noun phrase and is optional in the sense that it usually does not appear inthose cases when the necessary interpretation may be inferred from the context.The authors conclude that though Standard Portuguese plural marking is in itselfnot very complex, Kabuverdianu ''stands very close to the lower end of thecomplexity scale'' (p. 318).
Päivi Juvonen (''Complexity and simplicity in minimal lexica. The lexicon ofChinook Jargon'', p. 321-340) examines the lexicon of a Pacific Northwest pidgincalled Chinook Jargon, as used in a contemporary fiction text. He shows thatdespite the very restricted size of its lexicon, Chinook Jargon achieves highexpressive efficiency by using multiword constructions involvingsemi-grammaticalized lexical items (e.g. the 'verbalizer' _mamook_), and byallowing a much greater degree of multifunctionality of lexical items, than,e.g. English. Juvonen concludes that the lexicon of Chinook Jargon ''can indeedbe said to be simple from an information-theoretic point of view'' (p. 337).
EVALUATIONThis volume is undoubtedly a very interesting and valuable collection of paperson an important and widely discussed topic in current linguistics. Thecontributions to the volume cover a large variety of questions pertaining tolanguage complexity and use impressive cross-linguistic data. Here I would liketo pin down those of the ideas stated in the book which, in my opinion, deservespecial attention.
First of all, one of the most important points made in several contributions tothe book (e.g. Miestamo, Sinnemäki, Dahl) is that it is necessary to distinguishbetween the 'global' complexity of a language and 'local' complexity of aparticular subsystem or category. Moreover, as Miestamo argues, the task ofassessing 'global' complexity faces several problems which make it very hard toachieve a cross-linguistically valid, representative and unbiased metric of'global' complexity. Thus, studying complexity of individual grammatical orlexical domains in and across languages seems much more promising and fruitfulthan trying to develop holistic scales of complexity or to show that alllanguages are equally complex. Nevertheless, among the several attempts toconstrue a typology of 'global' complexity of languages presented in thisvolume, I'd like to point to that proposed by Mikael Parkvall. This approachseems promising because of its predominant functional orientation: it is basednot on the way languages encode things (cf. explicit exclusion of synthesis fromthe factors adding to complexity), but on the array of things they encode (suchas grammatical categories, special semantic distinctions etc.).
Another important point made in the book pertains to the role of languagecontact in language complexity. This issue is approached from different pointsof view: John McWhorter makes a very strong claim concerning the role ofnon-native acquisition in language ''simplification''; Casper de Groot argues onthe example of Hungarian that language contacts may induce a decrease ofmorphological complexity with a possible concomitant increase in the syntacticcomplexity; Eva Lindström shows that the complexity of a language may play arole in the degree to which it is used as a second language in polyethnicsocieties. Studies of 'local' complexity in contact situations seem verypromising, as well as the recognition of the fact that more abstract 'absolute'structural complexity must be distinguished from the 'relative' difficulty ofparticular aspects of the language for a non-native acquirer.
The third point I'd like to make is that the volume has clearly shown that atthe current state of linguistics it is premature to try to seriously asses the'relative' complexity of a language, i.e. the difficulty which may beexperienced during its acquisition as a second language. A telling example ofthis is Kusters's contribution, who, though explicitly rejecting the notion of'absolute' complexity in favor of the 'relative' complexity, defines the latterwith respect to a ''generalized outsider'', a character too abstract to beconsidered as a plausible model of a language learner. As Lindström writes,''difficulty ... depends on the individual we take as our starting point. If I amSwedish and learning Estonian, it is very difficult as the two languages arevery different; if I am Finnish it is a whole lot easier as many words andstructures are closely related, quite independently on the complexity of thesystems involved'' (p. 221). Similarly, for a native speaker of Adyghe,undoubtedly one of the most complex languages of the world by whatever metric,its closest relative Kabardian is obviously much easier to acquire than such arelatively less complex language as English. Thus, one might think that'difficulty' has much less to do with complexity than with, for instance, thedegree of similarity of the target language to the native tongue of the learner.And, again, just as complexity is not a holistic property of a language but israther differentially localized in its various subsystems, so different 'parts'of a language may be more or less 'difficult' for a learner.
The overall impression from the volume is that the linguists working on thetopic of language complexity have little agreement concerning both the generalnotions such as the definition of complexity and ways to measure it, and moreparticular details such as whether syntheticity adds to complexity or not. Someauthors explicitly and sometimes even passionately (e.g. Parkvall) argue againstthe idea that all languages are of equal complexity and, specifically, againstthe idea of 'compensation' and trade-off. Others (e.g. Riddle) defend this idea,and give quite compelling arguments that lack of complexity in one domain (e.g.morphology) may be 'compensated' for by complexity in another domain (e.g.lexicon). Positions differ as to whether analytical languages are less complexthan the synthetic ones (Gil) or whether they are just as complex but in adifferent respect (Riddle). Such discrepancies between different authors maycertainly be attributed not only to differing ideological or methodologicalpositions, but also to differences in the languages they base their analyses on.Moreover, Fenk and Fenk-Oczlon are even not sure whether rigid word order is asign of greater or of lesser complexity, and provide arguments for bothpositions. All this shows only that the topic of linguistic complexity is avery promising one, and that further research in this domain should be encouraged.
Instead of a conclusion, I would like to make a critical remark, which, minorthough it may seem, I consider to reveal a rather alarming condition of currenttypological research. My remark concerns errors in the data. In Parkvall'sarticle, on p. 266 ex. (3a) contains a short sentence in Adyghe; though thetranscription is correct, the glosses are a complete mess. In Hammarström'sarticle, on p. 290 Table 1 contains ten Russian numerals, out of which four areincorrect. If the only examples from two languages I am familiar with in thewhole volume contain errors, how could I be expected to rely on the correctnessof the examples from the languages I don't know? Unfortunately, the situationswhen data are misprinted, misglossed, mistranslated, and misinterpreted are notat all rare in the typological literature, huge and widely used databases suchas Dryer et al. (eds.) 2005 not being an exception, and this inevitably leads tothe decrease of reliability of the whole field of typology, since conclusionsdrawn from erroneous data cannot be fully correct.
REFERENCESDahl, Östen (2004). _The Growth and Maintenance of Linguistic Complexity_.Amsterdam: Benjamins.
Dryer, Matthew, Bernard Comrie, Martin Haspelmath & David Gil (eds.) (2005)._World Atlas of Language Structure_. Oxford: Oxford University Press.
Fenk, August & Gertraud Fenk-Oczlon (1993). Menzerath's Law and the constantflow of linguistic information. In _Contributions to Quantitative Linguistics_,ed. by R. Köhler & B. Rieger, 11-31. Dordrecht: Kluwer.
Hawkins, John (2004). _Efficiency and Complexity in Grammars_. Oxford: OxfordUniversity Press.
Lempel, Abraham & Jakob Ziv (1976). On the complexity of finite sequences. In:_IEEE Transactions in Information Theory_ IT-22 (1), 75-81.
Li, Ming & Paul Vitányi (1997). _An Introduction to Kolmogorov Complexity andIts Applications_. 2nd ed. New York: Springer.
Massey, James L. (1969). _Shift-register synthesis and BCH decoding. In: IEEETransactions in Information Theory_ IT-15(1), 122-127.
Menzerath, Paul (1954). _Die Architektonik des deutschen Wortschatzes_.Hannover: Dümmler.
McWhorter, John (2002). What happened to English? _Diachronica_ 19, 217-272.
McWhorter, John (2005). _Defining Creole_. Oxford: Oxford University Press.
McWhorter, John (2007). _Language Interrupted: Signs of Non-native Acquisitionin Standard Language Grammars_. New York: Oxford University Press.
Myers-Scotton, Carol (1993). _Duelling Languages. Grammatical Structure inCodeswitching_. Oxford: Clarendon Press.
ABOUT THE REVIEWERPeter M. Arkadiev, PhD in linguistics (2006), is a research fellow at theDepartment of typology and comparative linguistics of the Institute of Slavicstudies of the Russian Academy of Sciences, Moscow. His main interests arelinguistic typology with a focus on event and argument structure and its formalrealization, tense-aspect-modality and case marking. He works mainly onLithuanian and Adyghe.