Review of  Documenting Endangered Languages

Reviewer: Daniel William Hieber
Book Title: Documenting Endangered Languages
Book Author: Geoffrey Haig Nicole Nau Stefan Schnell Claudia Wegener
Publisher: De Gruyter Mouton
Linguistic Field(s): Language Documentation
Linguistic Theories
Issue Number: 23.2390

EDITORS: Geoffrey L. J. Haig, Nicole Nau, Stefan Schnell, Claudia Wegener
TITLE: Documenting Endangered Languages
SUBTITLE: Achievements and Perspectives
SERIES TITLE: Trends in Linguistics: Studies and Monographs [TilSM] 240
PUBLISHER: De Gruyter Mouton
YEAR: 2011

Daniel W. Hieber, Associate Researcher, Rosetta Stone


In recent years, documentary linguistics has established itself as a discipline
in its own right, with a unique set of theories, challenges, and methodologies.
“Documenting Endangered Languages: Achievements and Perspectives” seeks to
advance this discipline, and joins an important category of books which have
helped defined the field and its foci, including Farfán & Ramallo (2010),
Gippert, Himmelmann, & Mosel (2006), Grenoble & Furbee (2010), Harrison, Rood, &
Dwyer (2008), and Janse & Tol (2003). According to the back cover, “This volume
showcases recent developments in methodology, technology and analysis, drawing
on experience gained in a global range of documentation projects.” It consists
largely of case studies, but includes several chapters with broader perspectives
as well. The book is dedicated to Ulrike Mosel, and includes a laudatory preface
highlighting her notable contributions to the field of documentary linguistics.


After the preface, an introductory chapter outlines some of the history of
documentary linguistics as a field, particularly in relation to the Volkswagen
Foundation’s DoBeS program (Dokumentation bedrother Sprachen), and sketches what
the authors see to be the most salient lessons we have learned from the field,
namely: 1) Focus on documenting the full range of communicative practices; 2)
Concern for long-term storage and preservation of primary data; 3) Close
cooperation with, and direct involvement of, the speech community; and 4) The
scientific potential in large and diverse amounts of digitally archived data.

From there, the book is organized into four sections: 1) Theoretical issues in
language documentation, which focuses on broader perspectives on the field; 2)
Documenting language structure, which presents lessons from a series of case
studies on specific language documentation projects, each covering a different
aspect of language structure; 3) Documenting the lexicon, focused on the
creation of rich ethnographic and encyclopedic lexica; and 4) Interaction with
speech communities, which treats the impact of fieldwork and the outputs of

CHAPTER 2, ‘Competing motivations for documenting endangered languages’ (Frank
Seifart), offers a four-way typology of the impetus for doing language
documentation: for the preservation of human cultural heritage; to enhance the
empirical basis of linguistics; for and by the speech community; and in order to
study language contact. For each type, the author clarifies their respective
requirements, in terms of their content and apparatus (in the sense of
Himmelmann [2002, 2006]). Also discussed are several cases where these
motivations compete. The chapter is very brief, and serves mainly as a quick
overview of some of the whys of language documentation.

CHAPTER 3, ‘Evolving challenges in archiving and data infrastructures’ (Baan
Broeder, Han Sloetjes, Paul Trilsbeek, Dieter van Uytvanck, Menzo Windhouwer,
and Peter Wittenburg) also presents a high-level overview, of the manifold
issues in data archiving. It covers issues and strategies in data handling, with
a brief look at how formats and storage capacities have evolved over time, and
highlights the important role that DoBeS has played in establishing metadata
formats and archiving standards. The authors then walk the reader through some
of the core issues in data archiving, such as meeting the needs of various
stakeholders, long-term preservation requirements and what this entails for file
formats and the organization of the metadata, access restrictions, and legal and
ethical issues. They note the inherent conflict between access restrictions for
data and the recent push in academia towards open access to research results and
data. They also discuss a range of tools for enhancing data, namely ELAN, ANNEX,
and LEXUS, and detail some of the exciting advances in software for searching
and browsing through content and metadata. They end with a section on new
challenges (and benefits) in the field, such as improvements in recording
equipment and connectivity, allowing for easier dissemination of materials, and
changes in the preservation/curation of data, such as a recent new lossless
video format (MJPEG2000), and updating their archive to participate in
“externally registered persistent identifiers” (51). One large concern they
point out is that “the amount of recorded media streams that is not being
touched (annotated in some form to make it ready for analysis) is increasing
continuously which means that much of the stored data will effectively not be of
much use to anyone other than the person who collected it” (52).

CHAPTER 4, ‘Comparing corpora from endangered language projects: Explorations in
language typology based on original texts’ (Geoffrey Haig, Stefan Schnell, and
Claudia Wegener), illustrates some promising ways that the massive digital
archives which have been accumulated on endangered languages can be utilized for
cross-linguistic research. While the potential for such research is enormous,
few studies have undertaken this challenge to date. This chapter fills that gap
through examining some basic properties of information structure (the
distribution of S, A, P, and pronouns) across texts in five languages, using the
GRAID (Grammatical Relations and Animacy in Discourse) annotation schema. In
doing so, they demonstrate that there is indeed validity and feasibility to
typological investigations utilizing data from language documentation, with
significant payoffs.

CHAPTER 5 examines “Words” in Kharia: Phonological, morpho-syntactic and
“orthographical” aspects (John Peterson). It presents a fascinating study of
speakers’ intuitions regarding “words” in the Kharia language. After giving a
brief overview of the phonological and morphosyntactic criteria for wordhood in
Kharia, which might be called an “agglutinating” language even though it relies
heavily on clitics rather than affixes, the author presents the judgments of six
different speakers when presented with a spoken sentence they were asked to
write down. The results show that speakers vary widely in their
conceptualization of “words”, and particularly interesting was that “the only
principle which seems to hold for all is that speakers / writers tend to give
priority to phonology over morpho-syntax when these do not coincide […] In fact,
the preference for phonological criteria can even be so strong that single
morphemes can be divided up into two different written words” (116). While a
criticism of this chapter is that the author did not explicitly pull out lessons
for other documentation projects, such as suggested metrics or procedures, it
contains valuable insights for documentation regardless.

CHAPTER 6 is titled ‘Aspect in Forest Enets and other Siberian indigenous
languages: When grammaticography and lexicography meet different metalanguages’
(Florian Siegl). It reiterates the oft-recited lesson that “grammatical
categories should be described without interference from grammatical
descriptions and traditions of majority or related languages” (145) by means of
a detailed case study of aspect in Forest Enets. The system of aspect in Forest
Enets has traditionally been described along the same lines as that of Russian,
as consisting of ‘aspectual pairs’ of verbs rather than differences in
inflection / morphology. The author convincingly demonstrates, however, that
this analysis is not appropriate to Forest Enets. The grammatical traditions of
Russian have been carried over and applied in often subtle ways, such as
borrowing dictionary conventions from Russian dictionaries, which overplays the
kind of perfective/imperfective opposition common to Russian, but which is
foreign to Forest Enets.

CHAPTER 7, ‘Documentary linguistics and prosodic evidence for the syntax of
spoken language’ (Candide Simard and Eva Schultze-Berndt) argues that it is both
feasible and necessary to study the prosodic system of a language in the
analysis of syntactic constructions, via a case study of prosody in the language
Jamingung. The authors illustrate how “it is possible to distinguish, on the
basis of prosodic evidence alone, constructions such as reactivated topics vs.
afterthoughts; afterthoughts vs. discontinuous noun phrases, and two subtypes of
discontinuous noun phrase” (172). This is clearly a valuable set of tools for
language documentation and analysis.

In CHAPTER 8, ‘Diphthongology meets language documentation: The Finnish
experience’, Klaus Geyer presents a new method for analyzing diphthongs, using
Finnish as a case study. This is a much-needed contribution since, as the author
points out, guides to phonological analysis “remain somewhat fuzzy with respect
to procedures for working out a potential diphthong inventory” (178). Geyer’s
system makes use of both the static features of diphthongs (articulatory origin
and end points) and dynamic ones (movement in vertical tongue position, and
falling vs. rising sonority) to create a matrix of distinctive features, which
can then be used to distinguish a variety of diphthongs -- enough to handle even
the remarkably complex set of diphthongs in Finnish. The chapter closes with a
brief summary of the “diphthong analysis and description tool”, a tool which
future field workers would do well to utilize.

CHAPTER 9, ‘Retelling data: Working on transcription’ (Dagmar Jung and Nikolaus
P. Himmelmann) is a highly practical chapter detailing some ubiquitous hurdles
to working on transcription. The first is to point out why transcription is such
an alien and unnatural task to native speakers, and because of this, how
speakers undertake the task only with great reluctance. The latter half of the
chapter focuses on frequently-encountered strategies used by speakers when doing
transcription, which arise because linguists and native speakers see the goals
of transcription differently. Speakers use methods such as paraphrasing,
editing-out, changing, and editing-in to adapt the record as they see
appropriate, and the authors caution that it is important to document these
changes and their motivations, to provide both a precise transcript and a record
of changes applied by the speaker.

CHAPTER 10 details ‘The making of a multimedia encyclopaedic lexicon for and in
endangered speech communities’ (Gabriele Cablitz). This chapter showcases the
recently-created online lexicon tool LEXUS, developed by the Max Planck
Institute for Psycholinguistics, in conjunction with the relational linking tool
ViCoS (Visualization of Conceptual Spaces), in the documentation of the
Marquesan languages in French Polynesia. The author conveys sound advice for
enriching a lexicon with encyclopedic information, and creating interactive
visual folk taxonomies and ‘cultural knowledge spaces’ using ViCoS, which allows
end-users of the dictionary to better understand the relations and taxonomies
that hold between words. In addition, this chapter offers advice for web-based
collaboration with speech communities on lexicon projects, including design,
pitfalls and benefits, and capacity building. The final section discusses
lexicography in documentary linguistics, where the author argues for lexical
databases as not just documentation aids, as suggested by Himmelmann (2006: 10),
but as “an essential part of a language documentation itself” (252).

CHAPTER 11, ‘What does it take to make an ethnographic dictionary? On the
treatment of fish and tree names in dictionaries of Oceanic languages’ (Andrew
Pawley) advocates rich semantic descriptions for dictionary entries, rejecting a
principled distinction between lexical and encyclopedic knowledge. Instead, the
author argues that “it makes more sense to ask ‘Of the many characteristics […]
known to English speakers, which are the most salient?’ and ‘For the various
users of the dictionary, what is likely to be the most useful information to
include?’” (277). This chapter presents important concepts in taxonomic systems
and definition types, and highlights some of the challenges involved for anyone
intending to do a first general dictionary of a language, as well as advice for
overcoming such hurdles.

CHAPTER 12, ‘Language is power: The impact of fieldwork on community politics’
(Even Hovdhaugen and Åshild Næss) presents an interesting case study from the
authors’ own global fieldwork experiences, and particularly their work on the
Vaekau-Taumako and Äiwoo languages of the Solomon islands, which address some of
the many complex political and ethical issues involved in fieldwork. It is
always instructive to see where other fieldworkers have run into difficulties,
how they resolved those problems, and what they believe they could have done
better. The authors present the story of their conflict, and provide a sound
analysis of the problem in a way that demonstrates the importance of
understanding local power structures. Just as importantly, the authors show that
existing power structures may not always be adequate to address the particular
needs of a documentation project. The very presence of a fieldworker can
sometimes force new bodies of authority, power structures, or administrative
districts to come into existence. This is particularly true when the boundaries
of the language community do not align with currently-existing political
boundaries, and so new boundaries must be created.

CHAPTER 13, ‘Sustaining Vurës: Making products of language documentation
accessible to multiple audiences’ (Catriona Hyslop Malau) showcases an
innovative documentary video project with the goal of fostering revitalization
and reaching as broad an audience as possible. To that end, the audio in the
documentaries is entirely in the Vurës language, but every aspect of the films
and their packaging is also available in either English or Bislama (the national
language of Vanuatu). Two features of these films are especially interesting:
First, they include background information on the language and issues of
language endangerment, with information about how some regional languages have
already been lost, and explaining that this is the reason for the production of
the documentary. Second, each documentary includes a number of dictionary
entries presented on screen, depicted and defined, which supplement some of the
key concepts and topics in the film. Finally, the author relates how these films
have already been useful in maintaining and spreading cultural practices
throughout the region.

Finally, CHAPTER 14 treats ‘Filming with native speaker commentary’ (Anna
Margetts), and imparts a novel methodology for collecting commentary. When
previous attempts at running commentary fell flat, producing little usable data,
the author, by happy accident, came upon the idea of recording live sports
commentary. The new data was linguistically rich and provided a new type of
communicative event in her corpus, one that was much more engaging to speakers.
A useful section of this chapter is also the author’s discussion of an event
which did not have running commentary but would have benefited from it, noting
that such commentary makes useful metadata and contains other ethnographic
information useful in compiling rich lexicons like those outlined in chapters 10
and 11.


This book will be an excellent addition to the library of any documentary
linguist. Experienced linguists will find a number of new methodologies to
utilize in their work, while younger linguists will find in-depth treatments of
a variety of specific topics not covered (or not covered with any depth) in
introductory surveys, handbooks, or field guides. The book is perhaps most
similar to “Essentials of Language Documentation” (Gippert, Himmelmann, and
Mosel 2006), and covers many related and similar topics. But whereas
“Essentials” might be seen as the seminal survey of the field and its central
topics, the present volume is more of an ‘advanced topics in documentary
linguistics’, an excellent sequel to the former. As such, it consists largely of
case studies on specific topics, and does not aim for comprehensive scope over
the field. So while the book should not be seen as an all-inclusive handbook or
survey, it does advance the field significantly in many areas.

One complaint I have with this book is that the title is somewhat misleading,
and the mismatch between my expectations and the actual content of the book
perhaps hindered me from appreciating it at first. With a subtitle like
‘Achievements and Perspectives’, one would expect more sections on the history
of documentary linguistics or its practitioners, successful revitalization
models from different communities, or broad perspectives on the field in
general. To be sure, several chapters fit this bill well, including the
laudatory preface to Ulrike Mosel, Haig et al.’s chapter on motivations for
documenting languages, Broeder et al.’s chapter on evolving challenges in
archiving, and Hovdhaugen & Åshild Næss’ chapter on the impact of fieldwork. The
remaining chapters, however, deal with far more specific topics, almost all of
which are case studies.

On the other hand, the specificity of these chapters, and the extent to which
they offer new techniques and insights into these topics, is one of the
strengths of this book. Each of these chapters offers valuable lessons for
documentary linguistics from experienced fieldworkers; given this, a subtitle
such as ‘Lessons from the Field’ may have been more appropriate. In fact, I
think the best description of what this book offers comes from the statement on
the back cover: “This volume showcases recent developments in methodology,
technology and analysis, drawing on experience gained in a global range of
documentation projects.” Taking this, rather than the title, as the intended
goal for the book, it is clear that the editors have met and exceeded their
objective. The lessons in this volume are indispensable contributions to the
field that make significant advances in the practice of documentary linguistics
as a whole. Any documentary linguist, whether weathered veterans or just
entering the field, would be remiss to neglect the lessons from it.


Danny Hieber is a Linguist at Rosetta Stone, and has helped create language-learning software for the Chitimacha, Navajo, Iñupiaq, and Inuttitut languages. He also writes on language issues in the popular press. His primary interests are language typology, documentary and descriptive linguistics, and the economics and praxeology of language. He holds a B.S. in Linguistics and Philosophy from The College of William & Mary in Virginia. Learn more about his work at

