Date: Mon, 29 Nov 2004 17:51:38 +0200 From: Verginica Mititelu Subject: Lexicology and Corpus Linguistics: An Introduction
AUTHOR: Halliday, M. A. K.; Teubert, Wolfgang; Yallop, Colin; Cermáková, Anna TITLE: Lexicology and Corpus Linguistics SUBTITLE: An Introduction SERIES: Open Linguistics PUBLISHER: Continuum YEAR: 2004
Verginica Barbu Mititelu, Romanian Academy Research Institute for Artificial Intelligence and Institute for Linguistics
This textbook addresses the beginners in the field of lexicology and of corpus linguistics, providing an introduction to their most important concepts.
The first chapter (Lexicology, by M. A. K. Halliday) is a presentation of the history of lexicology, since its origins, in different parts of the world. The study object of lexicology is difficult to grasp: sometimes words are not easy to identify, there are languages in the case of which one cannot speak about words (e.g. Chinese). That is why people tend to use the term lexical item instead.
The methods of study and, at the same time, the information sources in lexicology are the dictionary and the thesaurus. Halliday compares and contrasts them in what concerns their organization and the information contained. Halliday also presents the last achievements within the field, basically due to the existence of large electronic corpora and of tools for data and text processing, thus permitting putting together lexical and grammatical data. As they cannot be separated from each other, it is better to speak about lexicogrammar, as one discipline.
In the second chapter (Words and meaning) C. Yallop first considers the nature of meaning. Dictionaries as inventories of word meanings are criticized, as they decontextualise meaning and treat it as a distinct entity. This is characteristic of both traditional (printed) and electronic dictionaries. Semantic nets (such as WordNet) were also criticized for this (see Buitelaar 1998, Weinreich 1964, Apresjan 1973).
Meaning is a social phenomenon; it is shaped and negotiated in social interaction. That is why, the best way to deal with it is inside the context in which it is used. Usage can contradict ideas such as: the most frequent meaning is the oldest one (the original meaning may even be lost along the history of language, or it may not exist, as in the case of names becoming words), or: the most frequent meaning is the core one (sometimes the emotive meaning is more frequent than the core one).
Dictionaries have been conceived as prescriptive linguistic works. Yallop takes the position according to which "the social nature of language brings a normativity of its own". Change in language cannot be prevented, as it reflects the changes in societies and cultures. In connection with this, one can discuss about the link between language and reality, more precisely between language and the perspective taken on reality: the change in perspective may bring about changes in language: see the distinct areas of vocabulary pertaining to different domains, to different contexts in which language is used, and more or less revealing of reality.
The topic of meaning can be discussed either intralingually or interlingually. From the latter perspective, the main idea to be remembered is that different languages elaborate on reality differently. However, one can speak about universalism in language, but this has to do with the way language functions in social life, not with "universal concepts" or with Chomsky's universal "deep structure", nor with the postulation of a universal framework or inventory out of which each language makes its own selection. One cannot speak about meaning without bringing the matter of translation into the discussion. Yallop's point here is that the translator needs to paraphrase the meaning within the relevant languages, rather than abstracting away from them.
The opening subchapter of the this chapter of this book (Language and corpus linguistics, by Wolfgang Teubert) is meant to motivate the further presentation of facts: languages are similar in some respects, but they are different in others. Meaning and its relation to word are one of the aspects which differentiate among languages. Chomskyan linguistics is preoccupied with language generation, while corpus linguistics analyzes the discourse. Syntax does not make a topic of interest here; the whole discussion is organized around the notion of meaning. If the word is the minimal unit in syntax, it is not the same when dealing with meaning. Moreover, the notion of word is controversial, it has not received a satisfactory definition so far, one to be valid for different types of languages (cf. Stati 1967). Traditional dictionaries (which include some collocations, idioms, etc.) and the analyses of corpora show that the best solution is not dealing with words, but with units of meaning, which are the mere words (in case they are monosemous) or the word plus all the words within its textual context that are needed in order to disambiguate this (polysemous) word.
Collocations should find their place in lexicons, according to the facts exhibited by corpora. If a combination of words does not present compositionality of meaning any longer and has a certain frequency of co- occurrence of its elements, then it should be treated as a collocation. The second part of this chapter gives the reader a short presentation of corpus linguistics and of its history. Corpus linguistics is the study of language by looking at discourse. Limitations are inherent: one can never study the whole language discourse; that is why, a selection is done on the material, taking care of the representativity of the material selected. The disadvantage of the method comes from the fact that the results obtained are approximations; the analysis of a new corpus can lead to (partially) different results.
The last chapter (Directions in corpus linguistics, by Wolfgang Teubert and Anna Cermáková) really motivates the title of the volume. The topics that the authors reach here are the following: representativity in language, typology of corpora (reference, special, opportunistic, monitor, parallel), meaning in discourse, meaning as usage and paraphrase, meaning in corpus linguistics, collocation, translation and parallel corpus.
The perspective taken on here (that of corpus linguistics) considers meaning a social phenomenon, negotiated by the members of the community using the language (see chapter 2). Meaning is both usage (i.e. what we found out about how it is used) and paraphrase (which serves to explain, to define the meaning).
A key-concept used when dealing with meaning from this perspective is that of collocation. It refers either to a fixed expression, with a certain grammatical structure and a fixed meaning, or to the immediate context of the target word. Evidence is brought from translations and from the analyses of parallel corpora that collocations should be dealt with separately in dictionaries, thus easing the interpretation (and of generation) process(es). The quality of this textbook is sustained by the clarity with which the chapters were written, by the examples provided for illustrating the ideas presented. Students are introduced to the current method(s) of studying meaning in corpus linguistics, which pays most attention to context. The short glossary of the terms of interest at the end of this book helps understanding the key-notions of the field.
Apresjan, J. D. (1973) Synonymy and synonyms. In: Trends in Soviet theoretical linguistics, ed. by F. Kiefer, Dordrecht, Reidel.
Buitelaar, P. (1998) CORELEX: Systematic Polysemy and Underspecification, PhD dissertation, Brandeis University.
Stati, S. (1967) Teorie si metoda in sintaxa. Bucharest, Editura Academiei.
Weinreich, U. (1964) Webster's Third: A Critique of its Semantics. International Journal of American Linguistics, 30:405-409.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
Verginica Barbu Mititelu is a researcher at the Romanian Academy Research Institute for Artificial Intelligence and Institute for Linguistics. She is interested in corpus linguistics, machine translations, natural language processing, and theoretical linguistics.