LINGUIST List 28.1878

Thu Apr 20 2017

Review: Afroasiatic; General Ling; Text/Corpus Ling; Typology: Vanhove, Mettouchi, Caubet (2015)

Editor for this issue: Clare Harshey <>

Date: 16-Nov-2015
From: Zoe Bartliff <>
Subject: Corpus-based Studies of Lesser-described Languages
E-mail this message to a friend

Discuss this message

Book announced at

EDITOR: Amina Mettouchi
EDITOR: Martine Vanhove
EDITOR: Dominique Caubet
TITLE: Corpus-based Studies of Lesser-described Languages
SUBTITLE: The CorpAfroAs corpus of spoken AfroAsiatic languages
SERIES TITLE: Studies in Corpus Linguistics 68
PUBLISHER: John Benjamins
YEAR: 2015

REVIEWER: Zoe Bartliff, University of Glasgow

Reviews Editor: Helen Aristar-Dry


The volume ‘Corpus-based Studies of Lesser-described Languages: The CorpAfroAs corpus of spoken AfroAsiatic languages,’ edited by Amina Mettouchi, Martine Vanhove, and Dominique Caubert, is a companion text to the online CorpAfroAs corpus project. This corpus contains one-hour-long samples of recorded speech from twelve AfroAsiatic languages: Kabyle, Tamashek (Berber), Hausa, Bata and Zaar (Chadic), Afar, Beja, Gawwada, Ts'amakko (Cushitic), Wolaitta (Omotic), Moroccan and Libyan Arabic, Juba-Arabic, Hebrew (Semitic). These have been transcribed and annotated with the primary intention of allowing examination of their prosodic and morphosyntactic features. This project was a pilot corpus designed to provide a model for other similar projects as well as allowing for the creation of a more user friendly and efficient version of the traditional annotation programs ELAN and Toolbox. The volume of collected essays reviewed here was written to accompany the project and was designed to provide what are in essence extensive accompanying notes to the corpus’ construction as well as giving the initial findings from analysing the languages themselves.

Divided into five parts, this volume covers a wide spectrum of content. There is initially an extensive introduction to both the history of the project and the volume itself. Written by the editors of the book (Mettouchi, Vanhove and Caubet) the Preface commences with a description of the lamentable lack of availability of Afro-Asiatic language sound-files and within that the lack of systematic annotation. This was the impetus behind the creation of CorpAfroAs. From this point there is a brief overview of what makes the corpus unique, namely the fact that it provided a homogenised model for the creation of such a corpus. The languages chosen are deliberately diverse and representative of all features of Afro Asiatic languages to enable the most comprehensive model possible. Also included is an overview of the glossing system, the choice to focus upon prosody and morphosyntactic features and finally a breakdown of the later sections of the book each of which takes a different focus.

The first and second parts of the volume concentrate on analysing samples of the corpus, but there is some discussion concerning the challenges of transcription faced within the project. ‘Representation of Speech in CorpAfroAs: Transcriptional Strategies and prosodic units’ by Shlomo Izre’el and Mettouchi, for example, commences the volume with discussion on the comprehensive and varying layers of transcription used within the corpus and places these within the context of the corpus, explaining their use and relevance. There is a distinct focus upon the tx tier (symbolic association or phonetic transcription) designed to faithfully represent the speech and the mot tier (morphosyntactic representation) as it is these units which most faithfully represent the prosodic value of the sample. An extensive section of this chapter is devoted to the explanation and demonstration of prosody of differing levels within a scattered sampling from the corpus. This is intended to be a survey of the corpus rather than a detailed study. Little attention is paid to the glossing tiers of transcription as this is the focus of a later chapter.

Bernard Caron in ‘Tone and intonation’ offers to the volume a more focused investigation into intonation within tonal languages. This chapter demonstrates above all else the potential uses of the corpus, as Caron conducts analysis of previously unavailable language samples, namely those from Zaar. Zaar is revealed as a mixed language with regard to intonation; it possesses both internal intonation and peripheral intonation.

In Part Two, Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira offer a similar intonation centred analysis of Zaar, Tamasheq, Juba Arabic and Tripoli in the chapter titled ‘Intonation of topic and focus.’ This is a more correlational study which treats each language individually in terms of the relational aspects between prosody and structural and semantic content of speech. Unlike Caron’s chapter, which is purely demonstrative of the opportunities for analysis offered by CorpAfroAs, this chapter offers valuable conclusions as to the tonal tendencies of the language family. Of particular note is the evidence of shared patterns of intonation for thetic speech and the bipartite division with interrogatives whereby the informational structure is either intonational or morpho-syntactic in nature.

The fourth chapter, by Il-Il Milibert and Martine Vanhove – ‘Quotative constructions and prosody in some Afro-Asiatic languages; towards a typology’ - focuses upon four genetically varied languages of the corpus (Beja, Zaar, Juba Arabic and Modern Hebrew) with regard to prosodic values of direct and indirect reported speech. Adapting existing systems of analysis to suit the annotation system of the corpus, Milbert and Vanhove provide a tentative model for the appearance of prosody within reported speech.

In Part Three, the focus of the volume shifts away from prosody and towards the issues raised by glossing and the cross linguistic analysis that this allows. Chapter Five, ‘Glossing in Semitic languages; a comparison of Moroccan Arabic and Modern Hebrew’, by Ángeles Vicente, Malibert and Alexandrine Barontini proposes a universal model for glossing morphological analysis to allow all readers, not solely those familiar with a language, to understand fully the purpose and meaning of individual morphemes. This is accomplished first with an overview of historical glossing for each of the chosen languages before progressing to the application of the proposed model to the languages and those within the extended family tree.

The next chapter ‘From the Leipzig Glossing rules to the GE and RX lines’ by Bernard Comrie returns to the focus of the volume, the CorpAfroAs project. This chapter is the glossing equivalent to the opening chapter of the book and focuses upon the adaptation of the standard Leipzig method of glossing to the tiered approach used within CorpAfroAs. The chapter commences with an overview of the tradition and importance of glossing as a manner of making a text accessible to an audience. Discussion then develops on to the requirements of the project and Afro-Asiatic languages as a whole, which require greater flexibility and more categorical variety than that permitted by the Leipzig Glossing Rules particularly with regard to the retrieval of grammatical categories from the corpus. The Chapter entitled ‘Cross linguistic comparability in CorpAfroAs’ by Metouchi, Graziano Savà and Mauro Tosco puts these glossing practices to the test with a cross-linguistic comparison of ‘ventive’ extensions, gender and case endings within the corpus. The tiered glossing proves an effective method for the retrieval of such grammatical features for analysis.

Further evidence of the effectiveness of the CorpAfroAs corpus for cross-linguistic comparability is provided by Zygmunt Frajzyngier and Mettouchi’s paper ‘Functional domains and cross-linguistic comparability.’ In this paper the focus is specifically on overcoming the difficulties faced within cross linguistic analysis in choosing the proper object for comparison. This chapter presents an approach not currently utilised by the CorpAfroAs corpus in that it transfers the data into a database, assigns functional domains and subdomains which are applicable across languages. It intends to shift the approach of such studies away from the use of universal categories towards those which are actually encoded within the grammatical system of a specific language.

The fourth part of the volume analyses the phenomena of code-switching and borrowing along with the implications that these have for the creation and use of the corpus. Manfredi, Marie-Claude Simeone-Senelle and Tosco in ‘Language contact, borrowing and codeswitching’ discuss the difficulties of glossing these two phenomena. Innovatively, they also utilise prosodic analysis to bring forward new conclusions concerning the identification of such characteristics within Afro-Asiatic languages and, most notably, the distinction between code-switching and borrowing.

The final and most technical chapter examines the creation of ‘ELAN-CorpA: lexicon-aided annotation in ELAN’ and is written by the software developer on the team, Christian Chanard. Chanard discusses the limitations of existing software for the creation of the corpus and then explains how the existing features of the Toolbox were integrated into the ELAN software to create a new and tailored program specifically designed to transcribe spoken samples.


This volume, although describing a unique and valuable project, is quite frustrating to read. The primary source of this frustration originates from the feeling that the volume attempts to integrate both linguistic analysis and analysis of the technological tools used to create the corpus. This is an ambitious goal which unfortunately would have been better accomplished across two separate volumes. As it stands the linguistic analysis feels not only lacking but disorganised, and the discussion of the technologies used seems fragmented to the point of incomprehension. It is possible to see that attempts have been made to structure the volume so that the linguistic analysis appears first, so as to demonstrate the value and extent of the corpus, and then, in the latter half of the book to examine the technicalities. Practically, however, the authors of the early sections are required to include elements of the technological analysis in their discussions. This leads to a slightly repetitive and confused development of the volume. Each of the authors in Parts 1 and 2 devotes a section of their chapter to describing, for example, the units utilised within prosodic analysis, or other concepts which are universal to the volume as a whole. It would perhaps have been a wiser editing decision to include an index for such terms or even a further introductory chapter which describes them and ensures consistency throughout.

Equally frustrating is that although this book attempts to stand alone as a critique and analysis of the CorpAfroAs project, it is essential that they are viewed together. It is exceptionally difficult to follow the studies contained within the volume without regular reference to the corpus itself, particularly with regard to the sound samples provided. Even this option, however, is denied in ‘The intonation of topic and focus’ where there is a section within which the data used is not part of the CorpAfroAs project or indeed accessible anywhere as it is personal data collected by one of the researchers.

These aspects, however exasperating they make the volume to read, do not detract from the value of the text and the project. CorpAfroAs is one of the only readily accessible sources for spoken Afro-Asiatic languages and in addition is a well-planned and wholly comprehensive model for the glossing of speech samples. This is something that has for a long time been lacking from the field of corpus linguistics and as such this project as a whole and the volume here reviewed are invaluable advancements to modern linguistic studies. Throughout the volume the contributors make an effort to suggest further paths of investigation further demonstrating that the field of corpus based Afro-Asiatic studies and the CorpAfroAs project are both within their early stages and ripe for further academic attention.


I am a PhD candidate at the university of Glasgow. My thesis aims to examine the interaction between Latin and Welsh during the Medieval period through the use of corpus and comparative linguistics.

Page Updated: 20-Apr-2017