Review of  Computational and Quantitative Studies

Reviewer: Veena Dixit
Book Title: Computational and Quantitative Studies
Book Author: Michael A. K. Halliday Jonathan J. Webster
Publisher: Bloomsbury Publishing (formerly The Continuum International Publishing Group)
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
Issue Number: 16.2330

Date: Thu, 4 Aug 2005 20:07:07 +0530 (IST)
From: Veena Dixit
Subject: Computational and Quantitative Studies

AUTHOR: Halliday, M. A. K.
EDITOR: Webster, Jonathan J.
TITLE: Computational and Quantitative Studies
SERIES: Collected Works of M. A. K. Halliday
PUBLISHER: Continuum International Publishing Group Ltd
YEAR: 2004

Veena Dixit, Center for Indian Language Technology, Indian Institute of
Technology, Bombay, India.

This is the sixth volume from the collected works of Professor M. A. K.
Halliday that runs into ten volumes. Professor Halliday has had a lifelong
engagement with language and these volumes represent the outcome. The book
portraits the developmental phases of machine translation (MT) from the
perspective of Firthian frame of lexical-functional grammar. Computer
technologies have developed considerably since the date the first article
of the volume appeared. Nevertheless, the early articles continue to be
relevant and not only from a historical point of view.


The book contains eleven articles divided into three parts. Each part has
a brief introduction by the Editor. There is an appendix containing a
trial grammar for a text generation project. The selection of articles
represents the sequential shift in the focus of the author's interest
while stressing the continuity and development of themes articulated in
the 1950s.

The central theme of the first part is that the linguistic analysis
secured on sound and scientific theory is the prerequisite of any language
oriented mechanical task. Such analysis offers language description in
mutually, unilaterally approximating comparative terms. The author
proposes that the description of languages, source language (SL) and
target language (TL), should cover levels of grammar and lexis at one end
and context at the other end. The description can be in the form of
statistical statements displaying quantitative analysis of occurrences of
items. The rules for the systematic relating of these two descriptions
should be appended to the descriptions. The expected relationship between
items is in terms of translation equivalence.

The second part contains six chapters, which continue and develop the
central propositions of the first part of the book. The linguistic system
is inherently probabilistic in nature. Grammatics, the theory of grammar,
has to be paradigmatic. Quantitative analysis will throw light on
probability of choosing. The basis for quantitative analysis of language
is the principle that the frequency in a text instantiates probability in
a system.

Corpus linguistics is as much about theory building as it is about data
collecting. Corpus provides methodological means for collecting evidence
of relative frequencies in the grammar, from which the probability
profiles of grammatical systems can be established. This is the theme for
the third part.


Chapter one: 'The Linguistic Basis of a Mechanical Thesaurus' (1956): The
fact that grammar and lexis exhibit high degree of internal determination
is exploited. Machine translation is defined as a function between two
given languages. Translation procedure involves translation equivalence,
equivalence of determining features and operation of particular
determining features in TL.

Autonomous analysis and construction of a mechanical thesaurus are needed
for MT. Grammar should be viewed as a statistics based statement of
lexical redundancy, which can be handled autonomously by Lattice program.
Thesaurus is defined as the lexical analogue of a grammatical paradigm, in
which words are arranged in a contextually determined series to achieve
translation as well as contextual equivalence. One can abstract the
collocation and the non-collocation features of context from the language

The proposition is substantiated with examples from Chinese and English.

Chapter two: 'Linguistics and Machine Translation' (1962): This article
foreshadows the themes developed by the author over the next thirty years.
There is no analogy between code and message on the one hand and form and
content on the other. A full description of a language involves categories
and methods, which are peculiar to that language. These categories need to
be used for stating the patterns of language and for showing how it works.
The author introduces necessary technical categories such as unit, form,
rank, and level. Description is complete when independent grammatical
description and lexical description is shown to be related.

The author expresses the necessity of quantitative analysis for the
description of the languages. Computer has to translate on more likely or
less likely basis than yes-no basis.

He concludes that the Interlingua for translation between pairs or groups
of the languages concerned can be neither natural language nor machine
language. It will have to be a mathematical construct serving as transit
code between natural languages.

Chapter three: 'Towards Probabilistic Interpretations' (1991): Professor
Halliday starts from a rather distant point by posing the question how
change is to be incorporated into the structural linguistic concept of a
system. Language may have infinite possibilities but it has a finite
number of users. A probabilistic model of lexicogrammar enables us to
explain register variations, which relates with diachronic variations.
When probability achieves a certainty, it is a category change. Every
single instance alters the probability of the system in some measure.

The difference between physical systems or biological systems and semiotic
systems lies in the key concepts of instantiation and realization. In a
semiotic system, instances have differential qualitative values (referred
as Helmet Factor). As to realization, linguistic systems are characterized
by stratification. The author wants to escape from constructivist trap.

Chapter four: 'Corpus Studies and Probabilistic Grammar' (1991): The
chapter is about the theoretical status of corpus frequencies. The author
refutes Chomsky's theory of competence and performance, as by definition
it made impossible that analysis of an actual text could play any part in
explaining grammar of the language. He points out that the corpus studies
are a well-established source of information about the grammar of
language. A statement about quantitative patterns of grammar is not an
attack on the freedom of choice of an individual while using the language.

Probabilities do not predict single instances; rather they predict the
general pattern. The significance of probabilities lies in interpretation
than prediction of the single instance. It is evident that even children
construe the lexicogrammar, on the evidence of text frequency, as a
probabilistic system.

Consistent with his views on the role of linguistics, Professor Halliday
holds that lexis and grammar are complementary perspectives and not
contrastive, opposing or unrelated fields. Each explains different aspects
of a single phenomenon.

Chapter five: 'Language as System and Language as Instance: The Corpus as
a Theoretical Construct' (1992): System and instance are two end observers
of a single phenomenon, the language. Every instance of a text perturbs
the overall probabilities of the system. The more we observe instances,
the better we perform as system observer. Professor Halliday emphasizes
that the corpus need to have very large sample of real text.

We can check the relative frequencies and the frequencies broken down by
the register to test the hypothesis regarding probability typology. We
need to measure how the probability of selecting one term is affected by
previous selections made within the same system. It is possible to measure
the complexity of the language through general measures such as lexical
density or specific measures such as length of nominal chains. The degree
of association between simultaneous systems can be found. Measure of
conditional probabilities can give insights into historical linguistics.

The chapter discusses the aspects of statistical measures of natural

Chapter six: 'A Quantitative System of Polarity and Primary Tense in the
English Finite Clause' (1993): This chapter is co-authored with Z.L.
James. The intention was to undertake basic quantitative research in the
grammar of modern English. The authors decided to access the corpus
directly using existing programs. They hoped to test the hypothesis that
grammatical systems fall largely into two types. There are systems where
the options are equally probable; there is no unmarked term in the
quantitative sense. In the other type of systems the options were skew,
one term being unmarked. The authors then detail the procedure adopted,
the problems faced and the important decisions taken during the course of
the study.

Chapter seven: 'Quantitative Studies and Probabilities in Grammar' (1993):
According to Professor Halliday, corpus linguistics modifies our thinking
about theoretical linguistics. He maintains that because of quantitative
studies, some interesting patterns seemed to emerge. Any concern with
grammatical probabilities makes sense only in the context of a
paradigmatic model of grammar.

Systemic functional corpus studies investigate systemic variation in
patterns of meaning on the plane of content rather than plane of
expressions. The studies investigate the internal relationship between two
systems within the grammar in terms of their interdependencies and their
logical semantic relationship.

In the second half of the chapter, the author discusses the factors, which
identify the grammatical systems for investigation and the decisions taken
during the study. There are procedures adopted and statements of
observations made during the studies, as also the analysis of inaccuracy
and the steps taken to deal with errors and omissions. He holds that the
analysis should be valid when applied to any natural text.

Chapter eight: 'The Spoken Language Corpus: A Foundation for Grammatical
Theory' (2002): The author holds that only in spoken language, the full
semantic potential of the system is brought into play, from which flow new
insights to the theory of language in total.

The metaphor, 'reducing spoken language to writing' suggests that some
features such as melody and rhythm are lost in transcribing the spoken
variety. Transcription should be faithful to the essential natural
features of the spoken variety, which are functional in carrying meaning.

With some reservations, the author accepts the distinction between 'corpus-
based' and 'corpus-driven' descriptions, both essentially need to be
theory based. He describes structure as theory of syntagm and system as
theory of paradigm.

He concludes that grammatical probabilities, both global and local, are an
essential aspect of 'what language really is and how it works'. The
discussion is supported by a few interesting examples and the results of
spoken corpus studies.

Chapter nine: 'On Language in Relation to Fuzzy Logic and Intelligent
Computing' (1995): The author expresses need for systemic analysis of the
language for MT rather than depending on commonsense knowledge about the
language. After detailing the distinct features of language as semiotic
system, he summarizes the complexity of language. The complexity arises as
the systems are not fully independent, and relate to one another. Nor do
they form any kind of strict taxonomy. There are various degrees and kinds
of partial association among the systems. Thus, there is a great deal of
indeterminacy, both in systems and in their relationship. The overall
picture is notably fuzzy. It is essential to account for fuzziness of
language, its disorder and complexity, not as accidental and aberrant, but
as systemic and necessary to convey the meaning.

Finally, he outlines the basic principles adopted in attempting to
theorize about language. He wants to formulate grammar paradigmatically,
contextually, functionally and fuzzily. Examples are used to illustrate
the principles of systemic modeling.

Chapter ten: 'Fuzzy Grammatics: A Systemic Functional Approach to
Fuzziness in Natural Language' (1995): This chapter is about the role of
grammar when natural language is to be used as a metalanguage for
intelligent computing. The basic metafunctions of natural language are
ideational, interpersonal and textual. Ideational metafunctions construe
experience, which can be material, mental, verbal or relational.
Interpersonal metafunctions enact social relationship and creates
discourse. Metafunctions are comprehensive, extravagant, telescopic, non-
autonomous, variable and indeterminate. Rhetorical toning, indistinctness,
unexpectedness, logogenesis, complexity, irrelevance, jocularity and error
are some of the problem areas of natural language as metalanguage.

The author expresses the need to model language reality in terms of
tendencies rather than in terms of categories. This makes it possible for
natural language to be its own metalanguage.

Chapter eleven: 'Computing Meanings: Some Reflections on Past Experience
and Present Prospects' (1995): MT began in 1950s with the premise that the
approach had to be mathematical and logical. It was only in the mid 1960
that the phenomenon of language came to be seen to be autonomous. In the
1980s, language came to occupy the central stage and computers became a
tool for linguistic research. Now research is at a stage where we can
think of computers functioning through the medium of natural language. It
was recognized that a word has its meaning only in the total meaning
potential of the language.

For intelligent computing to succeed, we will have to align language and
knowledge on the one hand and instance and the system of which it is an
instance on the other. Professor Halliday then summarizes those points of
linguistic complexity that will have to be taken into account if computing
with natural language is to succeed.

When computing will involve operating with natural languages, we will
finally be computing meaning.


A general theme runs through the book. Language is described as made up of
choices of alternative patterns. It is therefore inherently probabilistic.
Different aspects of the same issues are discussed in appropriate contexts
over different chapters. Many times the author draws on some probability-
based results to support his hypothesis.

The theoretical statements regarding sentence equivalence are not
supported by adequate discussion in chapter two.

In chapter three, the author supports his propositions by discussing
fieldwork for child language acquisition as well as cognitive processes
regarding language. This makes the propositions more meaningful.

It is stated that cause and effect in case of physical systems are
directional. However, the author has not considered whether this holds for
human perception also.

Firthian concept of 'system' in chapter four provides the necessary
paradigmatic base for corpus based probabilistic studies of the language.

In chapter six, the conclusions are tabulated. These conclusions are not
always short and sharp answers.

I personally disagree with the following statement in chapter
nine, "Literate, educated adults no longer have access to commonsense
knowledge about language; what they bring to language are the ideas they
learnt in primary school, which have neither unconscious insights of
everyday practical experience nor the theoretical power of designed
systematic knowledge" (p. 197). It appears to me that a person can improve
an acquired language by constant access to contemporary knowledge about
language. The capability for second language learning may support this

There is no justification for excluding ungrammaticality from formal model
of language in chapter ten. It is generally accepted that a linguistic
description to be complete has to account for ungrammaticality.

It should make us pause and think that school education is sufficient for
day-to-day language use but not adequate for MT. Is this inadequacy merely
the difference between the use and the explanation for the use?

Can we relate difference between patterns of spoken and written version of
the language to the gesture and facial expressions and body language,
which are concomitant with spoken language?

Perhaps corpus linguistics can be usefully supplemented by a study of
forms of non-verbal communication.


This unusual book displays Professor Halliday's different concerns and
endeavor to give linguistics, particularly, probabilistic corpus studies,
a central role in MT. While illuminating the developments, he provides
insights and linkages with different contemporary subjects.

On reading the book, the reader cannot but feel that it is only on the
development of a comprehensive theory of meaning that computational
linguistics can finally come into its own.


Chomsky, Noam (2004): New Horizons in the Study of Language and Mind,
Cambridge University Press

Dash, Niladri (2004): Corpus Linguistics and Language Technology, Mittal
Publications, New Delhi


The reviewer is M. A. (Linguistics) and pursuing her Ph. D. in 'Word Sense
Disambiguation'. She is engaged in research on the less-studied and
resource-poor language, Marathi, the state language of Maharashtra State
of India. She is a significant contributor to the development of
Morphology Rule-Based Spellchecker for Marathi. At present, she is working
on a Rule-Based Part-of-Speech Tagger for Marathi. She is participating in
the development of Wordnet for Marathi. She has undertaken to design a
course for learning Marathi as a second language. Her lectures on
morphology are available on the net. She has presented her work in
national and international conferences.

