Review of Lexicology and Corpus Linguistics

Reviewer: Verginica Mititelu
Book Title: Lexicology and Corpus Linguistics
Book Author: Michael A. K. Halliday Wolfgang Teubert Colin Yallop Anna Cermáková
Publisher: Bloomsbury Publishing (formerly The Continuum International Publishing Group)
Linguistic Field(s): Language Documentation
Book Announcement: 15.3333

Verginica Barbu Mititelu, Romanian Academy Research Institute for
Artificial Intelligence and Institute for Linguistics

This textbook addresses the beginners in the field of lexicology and of
corpus linguistics, providing an introduction to their most important

The first chapter (Lexicology, by M. A. K. Halliday) is a presentation of
the history of lexicology, since its origins, in different parts of the
world. The study object of lexicology is difficult to grasp: sometimes
words are not easy to identify, there are languages in the case of which
one cannot speak about words (e.g. Chinese). That is why people tend to
use the term lexical item instead.

The methods of study and, at the same time, the information sources in
lexicology are the dictionary and the thesaurus. Halliday compares and
contrasts them in what concerns their organization and the information
contained. Halliday also presents the last achievements within the field,
basically due to the existence of large electronic corpora and of tools
for data and text processing, thus permitting putting together lexical and
grammatical data. As they cannot be separated from each other, it is
better to speak about lexicogrammar, as one discipline.

In the second chapter (Words and meaning) C. Yallop first considers the
nature of meaning. Dictionaries as inventories of word meanings are
criticized, as they decontextualise meaning and treat it as a distinct
entity. This is characteristic of both traditional (printed) and
electronic dictionaries. Semantic nets (such as WordNet) were also
criticized for this (see Buitelaar 1998, Weinreich 1964, Apresjan 1973).

Meaning is a social phenomenon; it is shaped and negotiated in social
interaction. That is why, the best way to deal with it is inside the
context in which it is used. Usage can contradict ideas such as: the most
frequent meaning is the oldest one (the original meaning may even be lost
along the history of language, or it may not exist, as in the case of
names becoming words), or: the most frequent meaning is the core one
(sometimes the emotive meaning is more frequent than the core one).

Dictionaries have been conceived as prescriptive linguistic works. Yallop
takes the position according to which "the social nature of language
brings a normativity of its own". Change in language cannot be prevented,
as it reflects the changes in societies and cultures. In connection with
this, one can discuss about the link between language and reality, more
precisely between language and the perspective taken on reality: the
change in perspective may bring about changes in language: see the
distinct areas of vocabulary pertaining to different domains, to different
contexts in which language is used, and more or less revealing of reality.

The topic of meaning can be discussed either intralingually or
interlingually. From the latter perspective, the main idea to be
remembered is that different languages elaborate on reality differently.
However, one can speak about universalism in language, but this has to do
with the way language functions in social life, not with "universal
concepts" or with Chomsky's universal "deep structure", nor with the
postulation of a universal framework or inventory out of which each
language makes its own selection. One cannot speak about meaning without
bringing the matter of translation into the discussion. Yallop's point
here is that the translator needs to paraphrase the meaning within the
relevant languages, rather than abstracting away from them.

The opening subchapter of the this chapter of this book (Language and
corpus linguistics, by Wolfgang Teubert) is meant to motivate the further
presentation of facts: languages are similar in some respects, but they
are different in others. Meaning and its relation to word are one of the
aspects which differentiate among languages. Chomskyan linguistics is
preoccupied with language generation, while corpus linguistics analyzes
the discourse. Syntax does not make a topic of interest here; the whole
discussion is organized around the notion of meaning. If the word is the
minimal unit in syntax, it is not the same when dealing with meaning.
Moreover, the notion of word is controversial, it has not received a
satisfactory definition so far, one to be valid for different types of
languages (cf. Stati 1967). Traditional dictionaries (which include some
collocations, idioms, etc.) and the analyses of corpora show that the best
solution is not dealing with words, but with units of meaning, which are
the mere words (in case they are monosemous) or the word plus all the
words within its textual context that are needed in order to disambiguate
this (polysemous) word.

Collocations should find their place in lexicons, according to the facts
exhibited by corpora. If a combination of words does not present
compositionality of meaning any longer and has a certain frequency of co-
occurrence of its elements, then it should be treated as a collocation.
The second part of this chapter gives the reader a short presentation of
corpus linguistics and of its history. Corpus linguistics is the study of
language by looking at discourse. Limitations are inherent: one can never
study the whole language discourse; that is why, a selection is done on
the material, taking care of the representativity of the material
selected. The disadvantage of the method comes from the fact that the
results obtained are approximations; the analysis of a new corpus can lead
to (partially) different results.

The last chapter (Directions in corpus linguistics, by Wolfgang Teubert
and Anna Cermáková) really motivates the title of the volume. The topics
that the authors reach here are the following: representativity in
language, typology of corpora (reference, special, opportunistic, monitor,
parallel), meaning in discourse, meaning as usage and paraphrase, meaning
in corpus linguistics, collocation, translation and parallel corpus.

The perspective taken on here (that of corpus linguistics) considers
meaning a social phenomenon, negotiated by the members of the community
using the language (see chapter 2). Meaning is both usage (i.e. what we
found out about how it is used) and paraphrase (which serves to explain,
to define the meaning).

A key-concept used when dealing with meaning from this perspective is that
of collocation. It refers either to a fixed expression, with a certain
grammatical structure and a fixed meaning, or to the immediate context of
the target word. Evidence is brought from translations and from the
analyses of parallel corpora that collocations should be dealt with
separately in dictionaries, thus easing the interpretation (and of
generation) process(es). The quality of this textbook is sustained by the
clarity with which the chapters were written, by the examples provided for
illustrating the ideas presented. Students are introduced to the current
method(s) of studying meaning in corpus linguistics, which pays most
attention to context. The short glossary of the terms of interest at the
end of this book helps understanding the key-notions of the field.


Apresjan, J. D. (1973) Synonymy and synonyms. In: Trends in Soviet
theoretical linguistics, ed. by F. Kiefer, Dordrecht, Reidel.

Buitelaar, P. (1998) CORELEX: Systematic Polysemy and Underspecification,
PhD dissertation, Brandeis University.

Stati, S. (1967) Teorie si metoda in sintaxa. Bucharest, Editura Academiei.

Weinreich, U. (1964) Webster's Third: A Critique of its Semantics.
International Journal of American Linguistics, 30:405-409.


Verginica Barbu Mititelu is a researcher at the Romanian Academy Research
Institute for Artificial Intelligence and Institute for Linguistics. She
is interested in corpus linguistics, machine translations, natural
language processing, and theoretical linguistics.

