Review of  Consonant Clusters and Structural Complexity

Reviewer: J Joseph Perry
Book Title: Consonant Clusters and Structural Complexity
Book Author: Philip Hoole Lasse Bombien Marianne Pouplier Christine Mooshammer Barbara Kühnert
Publisher: De Gruyter Mouton
Linguistic Field(s): Phonetics
Issue Number: 25.679

This edited volume derives from presentations at the “Consonant Clusters and Structural Complexity” workshop, Munich 2008. The editors’ stated aim was to encourage interdisciplinary contact among linguists working on the issues included in the title, and this is reflected in the organisation of the book, which is divided into four parts, entitled “Phonology and Typology”, “Production: analysis and models”, “Acquisition”, and “Assimilation and reduction in connected speech”. Because of the large number of papers in the volume, I have generally restricted this review to the papers which I found most striking, or which warrant discussion.

PART I -- “Phonology and Typology”
“On relations between [sonorant] and [voice]”, by Rina Kreitman, takes a typological approach to the question of the place of voicing in relation to sonority. She takes a sample of languages used in previous typological surveys and 24 additional languages. She observes a series of implicational universals: SO > SS > OO > OS (S = sonorant, O = obstruent). In general, this hierarchy corresponds to the Sonority Sequencing Principle (SSP, e.g. Selkirk 1984), which asserts that (in unmarked cases) sonority should increase towards a peak at the beginning of the syllable. The only part of this hierarchy left unexplained by the SSP is the ordering between sonorant-sonorant and obstruent-obstruent clusters, which both have level sonority. To explain this, Kreitman observes a number of asymmetries between sonorants and obstruents observed elsewhere in the literature. Next, Kreitman notes that the features [sonorant] and [voice] have often been claimed to be linked, and the following section involves a similar typological survey of (obstruent) voicing in clusters. She first discusses the relative rarity of onsets where the first segment is voiced and the second unvoiced. This has been claimed (e.g. Lombardi 1999) to be an impossible cluster type, but Kreitman points to at least three languages in her survey have phonetic evidence for clusters of this type, namely Khasi (Mon-Khmer), Tsou (Austronesian) and Modern Hebrew. Having established that all possible voicing combinations do occur, she goes on to discuss the implicational relationships between them. From her survey she adduces the following relations: VU > UV > UU, and VV > UU (V = voiced, U = unvoiced). These implicational relations are quite different from those involving sonority, which would seem to constitute evidence against an explicit link between voicing and sonority. Kreitman’s final section proposes that no historical stage should allow violations of the implicational universals discussed.

In “Limited consonants clusters in OV languages”, Hisao Tokizaki and Yasutomo Kuwana discuss the proposed correlation between the complexity of syllable structure and word order, specifically the order of verb and object, or head and complement more generally, an observation for which they cite Lehmann (1973) and more recent scholars. They test this hypothesis with data from the World Atlas of Language Structures (WALS), finding that the average syllabic complexity of OV languages is of a similar order to VO languages, and also find the same result for linguistic genera. Comparing results for the macro-areas defined by Dryer (1992), they find no evidence for the hypothesis. However, they argue that if one considers more fine-grained distinctions of syllable complexity one may find evidence of a correlation. To this end they consider consonant clusters and head-complement order in some languages of China, observing that in their sample both the number of syllable-final consonants and the degree of head-initial ordering increases from north (Manchu) to south (Thai). Next, the authors assert that if one considers degrees of head-finality, there is a correlation between the consistency of head-finality and the number of segments per syllable. They also propose that the coda inventory is limited in OV languages, listing a number of OV languages with restricted inventories. They illustrate OV languages which eliminate clusters from borrowed words word-internally, and OV languages which have allomorphic alternations avoiding cross-morphemic clusters. The fourth section proposes an explanation for why one might expect OV languages to have simpler syllable structure. This rests on the notion that left-branching structures have a stronger boundary between constituents than right-branching structures. We expect that head-final languages will have more right-branching structures, and that clusters are phonologically dispreferred over this weaker boundary, while they are acceptable over a stronger one. Unfortunately it is not clear how this ties in to the asserted word-internal (lack of) syllable complexity.

In “Manner, place and voice interactions in Greek cluster phonotactics”, Marina Tzakosta considers the utility of the traditional sonority hierarchy in capturing the phonotactic restrictions which apply in Greek consonant clusters. She proposes that cluster well-formedness should be judged on the basis of three scales, manner of articulation, place of articulation and voicing. The voicing and manner scales correspond what would be expected from the traditional sonority hierarchy, while the place scale refers to the place hierarchy discussed by Prince and Smolensky (2004 [1993]) -- velar > labial > coronal. Using these scales, Tzakosta divides clusters into three categories, which she labels perfect (a gradient category, despite the name), acceptable and non-acceptable. Perfect clusters are those which satisfy all three scales non-vacuously, while non-acceptable clusters are those which violate the voicing scale. Acceptable clusters constitute the remainder. Unfortunately Tzakosta does not provide convincing evidence that perfect and acceptable clusters constitute meaningful categories. For instance, although her acceptable clusters are often adjusted so that they have a greater distance on one of the scales (e.g. vl > γl), she also provides examples in Greek of changes in the opposite direction, turning a perfect cluster into a merely acceptable one (e.g. pð > vð, θl > fθ). The distinction between non-acceptable clusters and other clusters does seem to be meaningful in Greek (although not universally, as the Kreitman’s paper would seem to show), and tautosyllabic [+v][-v] clusters are forbidden throughout the language. Tzakosta gives various examples of adjustments to clusters to increase distance along one of the scales, in L1 and L2 acquisition (the latter by Dutch and Romanian speakers). Finally, she observes a constraint on heterosyllabic clusters -- they must be non-acceptable in her schema. This would seem to be necessary, but it is not clear that it is sufficient: the examples that she gives all involve simultaneous violations of her manner scale.

PART II -- “Production: analysis and models”
‘Articulatory coordination and the syllabification of word initial consonant clusters in Italian’, by Anne Hermes, Martine Grice, Doris Mücke and Henrik Niemann, investigates differences between the gestural organisation of s+C clusters and plosive-sonorant clusters in Italian, couched within Articulatory Phonology. It is assumed (as in Nam and Saltzman 2003) that in the latter, while the consonantal gestures are coupled in-phase to vocalic ones (i.e. they are initiated simultaneously), the two consonantal gestures are coupled anti-phase to one another (one is initiated after the other). The competition between these couplings leads to a ‘C-centre’ effect: the centre of the cluster is roughly the same distance from the vocalic target. This implies a shift of the rightmost consonant towards the vowel. The purpose is to test whether the articulatory timing of s+C clusters reflects the same coupling. They recorded various tokens with single consonants, TR clusters, s+C clusters and s+CC clusters from two speakers of Italian, and measured the distance between the rightmost consonant of the cluster and the vocalic target. While the distance after TR clusters was significantly shorter than after single consonants, there was no significant difference in the distance between s+C clusters and single consonants, or between s+CC clusters and simple TR clusters. This is taken to imply that s+C(C) clusters in Italian did not form a single branching onset. This is contrasted to English, where Browman and Goldstein (2000) found evidence for a c-centre effect in those clusters.

Fang Hu’s ‘Tonogenesis in Lhasa Tibetan -- Towards a gestural account’, also investigates gestural timing of tonal and consonantal gestures in Lhasa Tibetan, with a view to providing an account of tonogenesis. Hu discusses the historical relation of tone to simplifications in syllable structure, mentioning the just noted ‘c-centre’ effect and Gao’s (2009) observation of a corresponding effect for lexical tone. Hu investigates 8 categories of Tibetan word, dividing them first into the traditional tonal register split of high and low, derived respectively from voiceless and voiced initials. He then divides them into four categories based on syllable type: long syllables (or those closed with a sonorant), short open syllables (which he calls ‘aspirated’, based on breathy voicing in the citation form), and two kinds of checked syllables: those ending in a simple glottal stop, and those ending in a cluster of a nasal and a glottal stop. He elicits citation forms from 3 female speakers in a frame (meaning ‘these characters are X’). As Hu notes, he is not necessarily eliciting everyday Lhasa pronunciation, but rather a ‘learned’ pronunciation. Moreover, his frame sentences use copulas like where they would not be used in everyday utterances (he also uses a copula , the provenance of which is not clear to me). Consonantal and vocalic gestures were recorded using an EMA system and the F0 of the audio recording was taken to correspond to the tone gesture. Hu briefly outlines the acoustic F0 results, noting that low syllables show a rise, checked syllables a fall, and other high syllables a roughly level contour. Hu points out that the emergence of the high/low register distinction from a voicing contrast is quite plausible phonetically, referring to Hombert, Ohala and Ewan (1979). Finally, Hu discusses the gestural organisation of syllables in Tibetan. He argues that they show a ‘c-centre’ effect of the sort Gao observes in Mandarin, with a vowel gesture starting after the consonantal gesture and before the tonal gesture. Interestingly, one speaker shows a significant difference between the lag between C-onset and V-onset, and that between V-onset and T-onset. The same speaker shows significant effects of syllable time and initial consonant on the lags. Hu does explain this, but notes that none of the speakers show a significant effect from the tonal specification of the syllable. Nonetheless, a c-centre effect is still clearly visible. Hu argues that this competitive coupling may have been a driving force for consonant cluster simplification in Lhasa Tibetan, though further research is necessary.

PART III -- “Acquisition”
Natalie Boll-Avetisyan’s paper ‘Probabilistic phonotactics in lexical acquisition’ is a study of the interaction between the acquisition of phonotactic constraints and of the lexicon. She discusses a number of previous studies (Storkel 2001, Storkel and Rogers 2000, Gathercole et. al. 1999), which seem to show that children are more adept at recalling CVC non-words if they have high phonotactic probability. She points to criticisms of Gathercole et. al. (1999) by Roodenrys and Hinton (2002), who argue that lexical neighbourhood effects should be taken into account in these experiments, and find in their own experiment (in adults) that phonotactic probability does not seem to have a significant impact on recall accuracy of non-words when such effects are accounted for. In two other studies taking these effects into account (Thorn and Frankish 2005 and Storkel et. al. 2006), the former found a positive effect on recall accuracy, while the latter observed a negative effect. Boll-Avetisyan observes that all these studies only concern CVC tokens, and therefore that the phonotactic probability they investigate is only a question of lexical frequency, and that more directly phonological factors such as markedness are not used. She draws a division between these two kinds of ‘sub-lexical representations’ (i.e. probability due to questions of lexical frequency and probability due to grammatical markedness), and proposes that the effects of phonological probability should be “modulated by markedness constraints”. She consequently suggests that structurally complex words should show stronger phonotactic probability effects with respect to word recall. To test this she measured reaction times of adult Dutch speakers when asked to confirm that a certain non-word appears in a preceding sequence of four, controlling for lexical neighbourhood effects. The initial experiment found significant differences between non-words with simple and complex syllable structure with respect to the relative reaction time to words with high and low probability biphones, as predicted. However, the speech rate of low-frequency words with complex syllabic structure was slower than high-frequency words, and it was suspected that this might affect the results. Another experiment was carried out with the tokens adjusted to have the same duration for each degree of complexity. This experiment was divided into two groups, with one group omitting words of intermediate complexity. When the groups were combined, they still show a significant difference in the behaviour of words with simple syllabic structure and complex syllabic structure, as above, showing a significant interaction between the frequency of biphones and syllable structure. As the paper concludes, it seems clear from the results that, as predicted, the effect of biphone frequency is modulated by syllabic complexity, but the experiment does not explicitly show that this is due to markedness -- Boll-Avetisyan points to this is a topic of future research. She argues that the results indicate that the proposed division of types of sub-lexical knowledge may well be correct, and discusses how this knowledge may be acquired. She reiterates the importance of controlling for speech-rate effects. Finally, she refutes the proposal that these effects may be due to word-length rather than complexity, pointing to results of Frisch et. al. (2000) which found no such effects in words which are long but simple with regard to syllable structure.

PART IV -- “Assimilation and reduction in connected speech”
In ‘Overlap-driven consequences of Nasal place assimilation’, Claire Halpert investigates a phenomenon in Zulu nasal assimilation whereby stops triggering nasal assimilation lose their laryngeal features, and a phenomenon (‘postnasal hardening’) whereby fricatives become affricates in the same environment. As in the papers of the second part of this volume, the approach is a gestural account grounded in Articulatory Phonology, but it is explicitly embedded in Optimality Theory (Prince and Smolensky (2004 [1993]). She confirms that the processes in question are genuinely due to assimilation by observing that they do not occur in clusters of /m/ (which does not assimilate) and the relevant segments, except where [m] appears as a result of assimilation. Postnasal hardening is attributed to the constraint enforcing assimilation (ASSIM) demanding a single oral gesture between adjacent segments. In this case, the single gesture in question is taken to be that of an affricate, consisting of complete closure with a ‘critical’ release. She proposes that the loss of laryngeal features, on the other hand, is due to a constraint *LONG, which forbids the lengthening of gestures, in conjunction with an alignment constraint (ALIGN) enforcing alignment between oral and laryngeal gestures, and a ban on nasal segments with marked laryngeal features. Since gestures are forbidden from lengthening, it must be assumed that assimilation is achieved by shortening the velar gestures, aligning them with the oral gesture. This predicts that clusters where assimilation occurs should be shorter than those where it does not. Halpert conducts a preliminary experiment showing that this prediction holds for a Zulu speaker, that underlying mC clusters differ significantly in length from singleton C, but that derived mC clusters do not. Finally, in other Bantu languages (viz. Kinyarwanda, Pokomo and Sukuma -- Kimenyi 1979, Sagey 1986, Maddieson 1991, Huffman and Hinnebusch 1998) we see nasals and stops in clusters displaying a single marked laryngeal gesture (specifically, aspiration). She takes this to be evidence for her account of overlap enforced by *LONG and ALIGN constraints, arguing that these cases are identical to Zulu except that aspirated nasals are permitted, meaning the laryngeal gesture is not deleted.

Finally, ‘The acoustics of high-vowel loss in a Northern Greek dialect’, by Nina Topintzi and Mary Baltazani, is a preliminary acoustic description of high vowel deletion in the Kozani Greek dialect. In this dialect, as in many other Northern Greek dialects, unstressed high vowels /i/ and /u/ are frequently deleted, often resulting in complex consonant clusters and word-final consonants which are not found in Standard Greek. Like vowel deletion in many languages, Kozani Greek vowel deletion appears to be gradient, including degrees of devoicing and frication. Topintzi and Baltazani report various effects on preceding consonants. For instance, when coronal [t] is followed by a deleted vowel, we appear to see increased aspiration, and, less robustly, increased duration. Unfortunately, they do not test for significance. Similarly, they report an increase in duration for certain fricatives and sonorants, but again don’t test for significance, which leads one to wonder whether the observed length increases are simply the result of statistical noise. They calculate the standard deviations for this data, finding that for most segments, the standard deviations for duration are larger when preceding a deleted vowel. Again however, they do not explicitly test the significance. Next, they examine the influence of the vowel’s environment on deletion. They observe that nearly half of all deletions occur either between voiceless consonants or following one word-finally. Deletions preceding a voiced consonant are relatively infrequent (constituting only 12% of deletions in their data). However, they do not discuss the frequencies of consonants in the relevant positions, which could perhaps have some impact on their conclusions. They observe that /i/ proportionally deletes less frequently than /u/, with unstressed /i/ being deleted 47% of the time, compared with /u/ deleting 75% of the time. They also contrast the positions in which /u/ and /i/ delete, observing that /u/ generally seems to delete pretonically, while /i/ tends to delete posttonically. However, as the authors note, instances of /u/ occurring in a position to delete are relatively small in number, and so it seems possible that the reason for the difference is a statistical fluctuation due to the small sample size -- tests of significance are again not carried out. The same problem occurs in the next observed apparent asymmetry between /i/ and /u/, where /u/-deletion tends to occur in word-initial syllables, in contrast to /i/-deletion which tends to occur medially. A final asymmetry is that surface unstressed [u] tends to be derived by raising of underlying /o/, while most unstressed [i] are indeed derived from /i/ (and only a relatively small proportion from /e/). Again, however, this could be ascribed to the relatively low frequency of /u/.

There is much of interest in this volume. For instance, Kreitman’s paper shows convincingly that voicing and sonority pattern typologically different from one another, suggesting that they should be treated independently -- a result which has quite wide-ranging consequences for both synchronic and diachronic phonology. This delinking of voicing and sonority suggests that lenition processes such as that found in Modern Welsh (for details, see Hannahs 2013), in which voiceless stops are voiced and voiced stops become fricatives, may be more complex than one might expect if we assumed a monolithic sonority hierarchy including voicing.

Most of the experimental studies are well thought out, with thorough statistical analyses and interesting results. Boll-Avetisyan makes the important observation that learners make use of abstract structural cues in addition to simple measures of probability when acquiring non-words, presenting an important argument for taking structural analyses into account when constructing learning experiments of this type. The papers on articulatory phonology in particular provide valuable insights into the correspondence between phonological structure and its articulatory realisation. The thorough experimental paper contributed by Hermes et. al. provides convincing articulatory evidence for an independently motivated (but not undisputed) phonological generalisation regarding Italian, namely that the /s/ in s+C clusters lies outside the onset (Davis 1990). Hu’s paper is valuable not so much as a description of the tonal system of Lhasa Tibetan (which is problematic to the extent that his data represents a ‘learned’ system rather than a natural one), but for its confirmation that effects concerning the timing of tone in relation to onset consonants and vowels observed in Mandarin (Gao 2009) also holds in Tibetan.

However, a few papers feel somewhat underdeveloped. Tokizaki and Kuwana, for example, find that their hypothesised correlation (between OV word-order and syllabic complexity) does not represent any kind of robust typological trend, but nonetheless do not report this as a negative result. When they consider a continuum of languages in China that they argue demonstrates a link between the two properties, their evidence is both limited (restricted to one such continuum) and selective. (Central Tibetan, for instance, has the same inventory of final stops as head-initial Thai, but is consistently head-final -- if Tokizaki and Kuwana had selected different endpoints of their continuum, they would have arrived at a quite different result.) Although they claim that there is a correlation between the consistency of a head-final syntactic pattern and syllabic complexity, this is not clear from simple inspection of the data they provide, and they do not perform statistical tests of correlation or significance. They also fail to adequately explain why the correlation would be expected to hold. In sum, while the hypothesis of this paper is interesting, there does not seem to be any data convincingly supporting it, or any satisfying explanation for why it should come about. This article in particular would have benefited from much greater rigour.

Topintzi and Baltazani also make extensive reference to quantitative data, but do not investigate whether the trends they observe are significant. Since the results are not sufficiently strong to determine significance by inspection, the lack of explicit tests means that it is difficult to have confidence that the trends exist, or if they are the result of statistical noise. Rather different issues are found in Tzakosta’s paper, which does not demonstrate that the categories she refers to are meaningful, and does not account for a number of examples which would seem to contradict her hypothesis. The range of approaches is, as the editors intend, very wide, but there is a noticeable dearth of work in formal theoretical phonology, with only the contributions by Ferré et al. (not reviewed here) and by Halpert making significant reference to formal models. Other papers discuss such models only in passing. Despite these criticisms, this volume will be of great use to anyone interested in the typology, phonetic realisation and acquisition of consonant clusters.

Joe Perry is a doctoral student at the University of Cambridge, currently completing a thesis entitled 'Tone and the Phonological Word in Gyalsumdo', treating tone in relation to the phonology-syntax interface in Gyalsumdo, a Tibetan variety from Manang district, Nepal.