Review of  Corpus-based Analysis and Diachronic Linguistics

Reviewer: Anna Ewa Majek
Book Title: Corpus-based Analysis and Diachronic Linguistics
Book Author: Yuji Kawaguchi Makoto Minegishi Wolfgang Viereck
Publisher: John Benjamins
Linguistic Field(s): Historical Linguistics
Text/Corpus Linguistics
Issue Number: 23.3724

EDITORS: Yuji Kawaguchi, Makoto Minegishi & Wolfgang Viereck
TITLE: Corpus-based Analysis and Diachronic Linguistics
SERIES TITLE: Tokyo University of Foreign Studies, Studies in Linguistics 3
PUBLISHER: John Benjamins
YEAR: 2011

Anna Ewa Majek, School of Linguistic, Speech and Communication Sciences, Trinity
College Dublin, Dublin, Ireland

The book under review consists of 14 papers written by a diverse group of
scholars who present and discuss a wide range of topics in different languages.
The articles are introduced by a message from the President, a description of
the Center for Corpus-based Linguistics and Language Education, an explanation
of synchronic and diachronic analyses and short summaries of each article.

The first paper ‘The Atlas Linguarum Europae: A Diachronic Analysis of Its Data’
by Wolfgang Viereck starts by presenting a short history and description of
‘Atlas Linguarum Europae’ (ALE map). The construction of ALE began in 1970 and
gives an accurate description of Europe’s linguistic situation. It distinguishes
six languages / families: Altaic, Basque, Caucasian, Indo-European, Semitic and
Uralic. There are 22 language groups in these language families, some of which
include a large number of individual languages. Viereck treats what he sees as
the three important aspects of the interpretation of ALE maps: loanword
research, etymological research and motivational research.

The next paper is devoted to ‘Variationism and Underuse Statistics in the
Analysis of the Development of Relative Clauses in German’. Anke Lüdeling, Hagen
Hirshmann & Amir Zeldes explore how multi-layer corpus architecture helps in
understanding change. The focus is methodological, based on an investigation of
the development of German relative clauses from Old High German to New High
German. The paper shows ‘how a deeply annotated diachronic corpus can help to
detect and study language change‘ (p.53).

The third paper deals with ‘Variation and Change in the Montferrand
Account-books (1259-1367)’. According to Anthony Lodge, the town of Montferrand
in central France possesses a great collection of medieval and early-modern
archives recording the town’s financial affairs and municipal life covering the
twelfth until the middle of the eighteenth century and written in the local
dialect. What is more they include, among other documents, a long series of
accounting books detailing the town’s income and expenditures from 1259 to 1731
with explanations and justifications of how and why the town’s money was spent.
In Lodge’s view, these account-books offer a rich source for historical
linguists. He compiles a Montferrand corpus consisting of account-books written
in Occitan in the period of 1259-1390, divided chronologically into three
sections: Tranche I: 1259-1319 (c. 67,000 words) includes primarily thirteen
century material; Tranche II: 1345- 1367 (c. 180,000 words) consists of the
middle third of the fourteenth century, the largest of the three sets; and
Tranche III: 1372-1385 (c. 165, 000 words) covers the period of language shift
from Auvergant to French in 1390. The paper presents examples of lexical,
syntactic, morphological and phonetic changes that can be gleaned from this corpus.

Wolfgang Raible presents ‘Cognitive Aspects of Language Evolution and Language
Change: The Example of French Historical Texts’. He analyses the earliest two
historical texts written in Old French prose, both of which deal with the Fourth
Crusade (1202- 1204) and presents the following theses: Thesis I: If in such a
situation authors try for the first time to write a historical text in prose,
they will use already existing generic models. Thesis II: It will still take
considerable time until the cognitive and linguistic framework for historical
prose proper will develop. Both were supported in both historical texts.

The next paper concentrates on ‘The Importance of Diasystematic Parameters in
Studying the History of French’ starting with the assumption that hypotheses in
diachronic linguistics can be confirmed or dismissed by means of corpora. To
illustrate this topic, Lene Schøsler uses the creation of the ‘composed past’,
from the Latin present form: ‘habeo litteras scriptas’, literally ‘I have
letters [that have been] written’. The main changes from the Latin present form
to modern Romance are well known but in her opinion they do not provide answers
on many other questions, such as: What is the function of the composed past in
the old texts: is it a present or a past form? What are the phases of change?
How does epic tense switching conform to analyses of the composed past? How may
we explain conflicting evidence in the old texts? The case study confirms the
hypothesis, ‘provided that corpora are composed in such a way that they permit
an exploration of relevance for various parameters’ (p.105).

Martin Becker presents ‘The Reorganisation of Mood in the Epistemic Subsystem-
The Case of French Belief Predicates in Diachronic Dynamics’, aiming to
illustrate how theories of modal semantics and corpus-based empirical research
can be combined to yield new insight into the processes and mechanisms of
language change. Data is taken from the Old French mood system in the domain of
belief predicates from Old to Classical French. Becker focuses on two basic
belief predicates: the verbs ‘cuid(i)er’ and ‘croire’ tested in two corpora: the
New Amsterdam Corpus and Frantext and the middle French subcorpus. The case
study shows that a theory-based analytical framework combined with historical
corpora can provide a deeper insight into the principles and mechanism of
language change but can not uncover the motivations which drive speakers to
switch systematically from one verb option (‘cuidier’) to the other (‘croire’)
at a certain period of time.

A paper ‘French Liaison in the 18th Century -- Analysis of Gile Vaudelin’s
Texts’ by Yuli Kawaguchi discusses French liaison and related phenomena in two
texts of Gile Vaudelin’s texts, namely: ‘Nouvelle manière d’écrire comme on
parle en France’, published in Paris in 1213 and ‘Instructions crétiennes, mises
en ortografe naturelle, pour faciliter au peuple la lecture de la sience du
salut’, published in Paris in 1715. Kawaguchi processes two texts through the
concordancer AntConc 3.2.1w to obtain quantitative information on verbs,
pronouns, articles, possessive adjectives, prepositions, adjectives, adverbs and
numerals and evaluates the situation of French linking phenomena in the
eighteenth century.

Antonio Emiliano in ‘Issues in the Typographic Representation of Medieval
Primary Sources’ states that a bad transcription of medieval primary sources for
linguistic and philological study may ruin a corpus or archive or seriously
diminish its value for research. He proposes a set of possible strategies
regarding the typographic representation of medieval texts, the aspects of
corpus encoding and character of encoding procedures that in his view should be
used by the researcher intending to carry out a study based on medieval texts.

The next paper focuses on ‘An Analysis of the Misuse of the Participle in Old
Russian Texts’. According to Yoshinori Onda, Russian language lacks a
distinction between the functions of participles and adverbs. The author aims at
analyzing this misuse of participles in the texts from Old Church Slavonic and
Old Russian texts and proposes a functional explanation of its causes. Onda
presents two hypotheses: Hypothesis I: the similarity of the syntactic
structures caused the confusion in participle use, Hypothesis II: an attitude of
the copyist toward the original texts influenced the copied texts. Both
hypotheses were supported but for the second Onda was unable to determine the
nature of the relationship between the text type and the attitude of the copyists.

Robert Ratcliffe carries out ‘A Preliminary Analysis of Arabic Derived Verbs in
the Leeds Quran Corpus -- With Special Reference to Stem III (CaaCaC)’. The
author analyzes data from Leeds Quran Corpus to quantify the semi-productivity
of the derived stems in Quranic Arabic.

Makoto Minegishi, Jun Takashima & Ganesh Murmus’ paper ‘On the Narrow and Open
‘e’ Contrast in Santali’ examines whether the contrast between narrow and open
‘e’ is phonologically distinct in Santali. The analysis is carried out in the
BSD corpus (Bodding’s Santali data). The authors consider the most frequent
syllable patterns and the candidates for minimal pairs that have exactly the
same phonemic environment. The paper concludes ‘that the vowel contrast between
‘el’ and ‘e2’ is not a full-fledged phonemic one’ (p.221).

Tomoyuki Yamahata in ‘The Classification of Apabhraṃśa -- A Corpus- based
Approach of the Study of Middle Indo-Aryan’ investigates the variances of the
texts of Apabhraṃśa language. This language presents great variation across
documents and this leads to numerous classifications of Apabhraṃśa. Tomoyuki
Yamahata reviews these classifications and uses criteria from a corpus of
Apabhraṃśa. The corpus consists of eight texts derived from Eastern, Southern,
Western and Kashmiri Apabhraṃśa. Yamahata assumes that variation in Apabhraṃśa
languages can be classified on the grounds of style, and specifically shows a
tendency based on a degree of preference of the pseudo-archaic forms but it is
insufficient for the classification of Apabhraṃśa.

Ayako Shiba in ‘Changes in the Meaning and Construction of Polysemous Words: The
Case of ‘mieru’ and ‘mirareru’ focuses on revealing how verbs have recently
extended their evidential meaning. To achieve this Shiba concentrates on two
forms of ‘miru’ (‘to see, to be able to see’): ‘mieru’ and ‘mirareru’ and
analyses them in the Modern Japanese and Present-day Japanese corpora. Both
consist mainly of critical essays on history, science and culture but only the
Modern Japanese Corpus includes works of fiction. Ayako Shiba shows the
difference between ‘mieru’ and ‘mirareru’ in their meaning-construction types
and demonstrates the distribution of each type in the Modern Japanese corpus and
Present-day Japanese corpus.

Kanetaka Yarimizu’s ‘Language Change from the Viewpoint of Distribution Patterns
of Standard Japanese Forms’ treats the standardization process of Japanese in
five historical stages using data from dialect research. Two different data sets
are used. The first is the ‘Grammar Atlas of Japanese Dialects’ (GAJ), and the
second is the ‘Glottogram survey’, also referred to as the TH survey. The five
stages of his standardization models are as follows: 1. The period until the
mid-eighteenth century), 2. from the mid-eighteenth century to the end of the
nineteenth century), 3. from the end of the nineteenth century to the
mid-twentieth century), 4. from the mid-twentieth century to the present), 5.
The present. The study shows that standardization progressed gradually. During
the first two stages, the traditional dialects forms were used. In the modern
third stage, standardization progressed through education but traditional forms
were still used in the private domains. In the fourth stage, standardization
progressed and in the fifth stage, it approached completion. Yarimizu presumes
that standardization is strongly affected by the mass media.

The book is primarily aimed at historical linguists but it would also be a
valuable source of information for those interested in corpus linguistics. A
positive attribute is that the book collects papers from a wide range of topics
with analyses of different languages, some no longer spoken such as Apabhraṃśa,
or are not widely known, for example Santali. An important quality is that it
includes articles recommendations for further study and offers advice for

The book also has one drawback, the organization of the articles. There are two
types of papers: the ones which present analyses and the ones which give
recommendations. It would be better if the articles were divided into two parts
as it would facilitate reading and improve the coherence of the book.

All in all, the book is inspiring and absorbing. It provides significant insight
into synchronic and diachronic variation and is a great contribution to
corpus-based studies. The editors are to be congratulated for bringing together
such a diverse group of scholars and such a wide range of analyses.

Anna Ewa Majek is a PhD research student at Trinity College Dublin. Her primary research interests include corpus linguistics, language variation and change, and sociolinguistics.