Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

New from Wiley!


We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Review of  Potentials of Language Documentation

Reviewer: Richard G Littauer
Book Title: Potentials of Language Documentation
Book Author: Frank Christian Seifart Geoffrey Haig Nikolaus P. Himmelmann Dagmar Jung
Publisher: University of Hawai‘i Press
Linguistic Field(s): Language Documentation
Issue Number: 24.2604

Discuss this Review
Help on Posting
''Potentials of Language Documentation: Methods, Analyses, and Utilization'', a Special Publication of the Language Documentation & Conservation journal edited by Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek, is a collection of 18 chapters which were originally presented at a workshop at the Max Planck Institute for Evolutionary Anthropology in Leipzig, November 2011. The workshop was composed of language documentation practitioners, experts on computational methods, linguistic researchers, and applied documentation linguists (working on maintenance, curation, and presentation of corpora). The publication, as well as offering an overview of the state of the field for language documentation, is meant to cover advances and potential work for three different aspects of language documentation -- namely, computational methods, analyses, and utilization. The volume is split into three different sections, covering each of these aspects.

In Chapter 1 (1-6), ''The threefold potential of language documentation,'' Frank Seifart, the first editor, introduces the volume and explains the need for it. He gives a brief overview of each chapter and section. He goes on to point out additional aspects of language documentation potential not addressed explicitly in the papers. First he states that the possibilities of computational methods are virtually endless, and that ''real challenges are often conceptual, not technical'' -- he advises that multidisciplinary action is necessary to overcome this, especially as ''there is often not one single ideal computational solution for a linguistic problem.'' Thus, modularisation of various techniques and implementation of interactive learning are useful. Second, multimodal corpora are now available and there are methods to study them, making this a promising research area. Third, documentation archives are normally focused on particular regions instead of being region-independent, and a possible solution would be mirroring archives centrally. Finally he reiterates the often acknowledged fact that language documentation receives less academic recognition than journal articles. To help change this and encourage more language documentation work, one outcome of the workshop is that the LDC will have a special section for the review of online language documentations in the future.

As Seifart notes in the introduction, the authors of the five chapters in “Part One: Methods” address the central question of ''How do computational methods developed for large corpora of well-known languages apply to the relatively small language documentation corpora of less well-known languages?'' They do this in a variety of ways -- Chapters 2, 3 and 5 call for carefully planning future corpora and annotation schemes, while Chapters 4 and 6 lay out an unsupervised method that works well on small corpora (despite statistical techniques largely needing large amounts of input) to help cut down on time spent on annotation.

In Chapter 2 (7-16), ''Prospects for e-grammars and endangered languages corpora'', Sebastian Drude explores three aspects of current computational documentation research -- hypertext grammars, treebanks and the Grammar Matrix project (Bender et al. 2010), and interoperable grammars. He covers them briefly before exploring how they could be used together to provide more comprehensive grammatical descriptions and more comparable corpora, and ultimately a better understanding of language.

In Chapter 3 (17-24), Jost Gippert covers current difficulties with marking up corpus data for minority endangered languages that display large amounts of code switching, in ''Language-specific encoding in endangered language corpora.'' He uses examples of code switching in three Caucasian languages to point out problems and questions that arise from fine-grained language identification in corpora, as well as to show how the emerging ISO standard 639-6 could be used to help with accurate demarcation of languages and dialects.

In Chapter 4 (25-31), ''Unsupervised morphological analysis of small corpora: First experiments with Kilivila'', Amit Kirschenbaum, Peter Wittenburg, and Gerhard Heyer develop a method for unsupervised (statistical) morphological analysis and annotation of small corpora from low resource languages, using a word co-occurrence model to find statistically relevant groupings of words (which are either etymologically or morphosyntactically similar), then aligning them using multiple sequence alignment (a method from bioinformatics) to find regularities. The method performs better than random, and they briefly discuss how they pan to integrate it with other methods in the future.

In Chapter 5 (32-38), ''A corpus linguistics perspective on language documentation, data, and the challenge of small corpora'', Anke Lüdeling discusses how a flexible corpus architecture is necessary for low resource language corpora. There are many parameters which influence variant choices speakers make, and the metadata which may be relevant for future understanding of the language is not always immediately clear. As such, she argues that it is important to design corpora to which annotation layers and metadata can be added at any point.

In Chapter 6 (39-45), ''Supporting linguistic research using generic automatic audio/video analysis'', Oliver Schreer and Daniel Schneider describe several automatic tools that could be used to expedite annotation processes for audio/visual corpora, developed in the AVATech project (Auer et al. 2010, Tschöpel et al. 2011). In particular, they cover tools developed for audio segmentation, speech detection, speaker clustering, vowel and pitch contour detection, shot/cut detection and key frames extraction, global motion detection, skin color estimation, head and hands tracking, and user interaction. They also compare the use of these tools to manual analysis in a preliminary experiment, highlighting the time that using these tools will save annotators.

In “Part Two: Analyses”, the editors include chapters addressing the question of ''What impact has language documentation had on analyses and theorizing in linguistics and related disciplines so far and how can it make greater impact?'' Each paper is in some sense, reflecting this, a call for better documentation practices, and each uses examples to point out how documentation has helped illuminate an issue, or how adoption of best practices for documentation could benefit the field as a whole.

In Chapter 7 (46-53), titled ''Bilingual multimodality in language documentation data'', Marianne Gullberg raises current unanswered questions in bilingualism and second language acquisition research, as well as drawing attention to gaps in documentation of multimodality in languages. She points out that there are large gaps that must be filled in current documentation practices and research, and calls for more joint ventures and collaborative interdisciplinary work to further our knowledge of existing data, and to inform new data collection.

In Chapter 8 (54-63), ''Tours of the past through the present of eastern Indonesia'', Marian Klamer looks at new documentation of minority languages in a specific region to shed light on their origins and history, and to highlight cases where new efforts in documentation can explain historical phenomenon and language phylogenies. In particular, she is able to provide a convincing argument regarding the origin of Alorese, an Austronesian language spoken on Pantar and Alor, by comparing it to Lamahalot, a language based 200km away, and by looking with fresh eyes at ethnological evidence.

In Chapter 9 (64-72), ''Data from language documentations in research on referential hierarchies'', Stefan Schnell uses a textual analysis of the Oceanic language Vera'a to examine the referential hierarchy, particularly involving number and object marking. His textual analysis is used in contrast to traditional structural descriptions and elicited data, and he examines how this method could potentially be used across corpora to expedite research.

In Chapter 10 (73-82), ''Information structure, variation and the Referential Hierarchy'', Jane Simpson also looks at the referential hierarchy, using the Australian language Arrernte, which exhibits a putative counterexample to a proposed typological universal. She calls for larger corpora of texts, linked to audio-visual recordings, in order to fully observe and record languages which may be typologically interesting -- particularly those like Arrernte which are undergoing or have recently undergone massive changes. She fully explains here how better documentation may have given a fuller understanding of the language and the typological feature in question.

In Chapter 11 (83-89), ''How to measure frequency? Different ways of counting ergatives in Chintang (Tibeto-Burman, Nepal) and their implications'', Sabine Stoll and Balthasar Bickel discuss the best way to measure frequency. By looking at different options -- raw numbers per age in months or ergatives per word, per transitive verb, or per time unit -- they come to the conclusion that using time-alignment and measuring frequency in a given time window is the most psychologically relevant way to count, instead of the standard frequency of a feature given the opportunity for it.

In Chapter 12 (90-95), ''On the sociolinguistic typology of linguistic complexity loss'', Peter Trudgill points out that small, minority, and often endangered languages have been affected by different socio-structural conditions, influencing their typological complexity, than the larger languages upon which most of modern linguistic theory is built. He points out that they are generally more mature, more complex, and made up of more intimate societies than more global languages, and raises a call to arms for linguists to document minority languages -- or else the only languages left to study will be historically atypical languages.

In “Part Three: Utilization”, the central combining question is ''How can language documentation data be stored, represented, and made accessible in order to be utilized in a broader context?'' The chapters here range from guides for linguists in the field (Chapter 14-16) to outlining the new DoBeS portal (17), to different tools that can be used by linguists now (Chapters 13, 18).

In Chapter 13 (96-104), ''Visualization and online presentation of linguistic data'', Hans-Jörg Bibiko uses R, open-source statistical and graphic software, to show how wordlists, structural features (such as those from WALS (Dryer & Haspelmath 2011)), and geographical information can be easily graphed. He gives a good, brief overview of the possibilities R presents to linguists.

In Chapter 14 (105-110), ''Language archives: They’re not just for linguists any more'', Gary Holton describes how the Alaska Native Language Archive (ANLA, has been useful not just for linguists, but how it has been queried for non-linguistic data, such as ethnoastronomy, ethnomusicology, and ethnobotany. He uses the example of Eyak, a severely endangered language undergoing revitalisation, to illustrate how archives can be useful to language communities. He calls for archives to be constructed to allow for these two types of use.

In Chapter 15 (111-117), ''Creating educational materials in language documentation projects – creating innovative resources for linguistic research'', Ulrike Mosel presents a way that linguists can work with a community, helping to produce education material while also building a language documentation corpus. She gives an overview of work done creating a book of local stories following this method in Teop, an Oceanic Meso-Melanesian language spoken in Papau New Guinea.

In Chapter 16 (118-125), ''From language documentation to language planning: Not necessarily a direct route'', Julia Sallabank looks at common difficulties arising in language planning, policy implementation, and documentation, using Guernesiais as an example. In particular, she highlights when the views of all stakeholders -- not just the native speakers, but also semi- and heritage speakers -- are as valid as those of documentary linguists or language planners.

In Chapter 17 (126-128), ''Online presentation and accessibility of endangered languages data: The General Portal to the DoBeS Archive'', Gabriele Schwiertz gives an overview of the DoBeS online portal, which can be found at The hope is that the DoBeS online portal will allow the resource to be used more easily and regularly.

In Chapter 18 (129-134), ''Using language documentation data in a broader context'', Nick Thieberger concludes by discussing the scale of current global documentation efforts, and ways to ensure longevity of linguistic archives. He covers how digital data should be stored and curated, what standards are available and accepted, and how presents a all for more effort on all sides, from training new linguists to maintaining old archives, in order for language data (and ultimately languages) to not be lost.

The workshop from which these papers grew was held in order to ''critically discuss and make more explicit the threefold potentials of language documentation,'' as Frank Seifart states in the introduction. The three potentials -- computational methods, analyses, and utilization -- are clearly evident, in that each chapter deals with one or more of them. The collection was organised into three parts around these potentials, and each section responds to specific questions that each potential raises. On the whole, this worked moderately well. However, the collection still reads like workshop proceedings with a loose theme rather than a fully coherent volume. Some of the papers -- such as Chapter 13 on R and Chapter 17 on DoBeS -- struggle to fit with others, for example Chapter 16 on language policy and planning in Guernsey.

That said, the chapters cover many of the pressing issues facing the language documentation community today, and many are spot on in calling for renewed or focused efforts -- for instance, in carefully considering frequency measures as in Chapter 11, or in planning a corpus as in Chapter 5. Many chapters feature detailed examples from particular languages, which provide a framework for the linguist or student reading to easily interpret how the central message could be applied to their own research. At times it is clear that language communities and non-linguists may be able to use this work themselves -- for instance, Chapter 15 has ideas for starting revitalisation efforts. On the whole, this volume is approachable, timely, and useful for anyone involved in language documentation efforts.

Auer, Eric, Peter Wittenburg, Han Sloetjes, Oliver Schreer, Stefano Masneri, Daniel Schneider & Sebastian Tschöpel. 2010. Automatic annotation of media field recordings. In Caroline Sporleder & Kalliopi Zervanou (eds.), Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2010), Lisbon: University of Lisbon.

Bender, Emily M., Scott Drellishak, Antske Fokkens, Laurie Poulson & Safiyyah Saleem. 2010. Grammar customization. Research on Language and Computation 8(1). 23–72.

Dryer, Matthew S. & Martin Haspelmath (eds.). 2011. The World Atlas of Language Structures Online. Munich: Max Planck Digital Library.

Tschöpel, Sebastian, Daniel Schneider, Rolf Bardeli, Oliver Schreer, Stefano Masneri, Peter Witten- burg, Han Sloetjes, Przemyslaw Lenkiewicz & Eric Auer. 2011. AVATecH: Audio/Video tech- nology for humanities research. In Cristina Vertan, Milena Slavcheva, Petya Osenova & Stelios Piperidis (eds.), Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, Hissar, Bulgaria, 16 September 2011, 86–89. Shoumen, Bulgaria: Incoma Ltd.
Richard Littauer is a graduate student in Computational Linguistics, studying for a joint degree at the University of Malta and Saarland University. He completed an MA (Hons) in Linguistics at the University of Edinburgh. His main research interests include minority language documentation and conservation, particularly involving developing resources for low-resource languages, as well as understanding language change on a historical and evolutionary timescale.

Format: Electronic
ISBN-13: 9780985621100
Pages: 134
Prices: U.S. $ 0.00