LINGUIST List 14.2879

Tue Oct 21 2003

Review: Comp Ling: Nirenburg, Somers & Wilks (2003)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>


What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at siminlinguistlist.org.

Directory

  1. Bob Kuhns - Sun Microsystems Labs BOS, Readings in Machine Translation

Message 1: Readings in Machine Translation

Date: Tue, 21 Oct 2003 17:17:38 +0000
From: Bob Kuhns - Sun Microsystems Labs BOS <Robert.KuhnsSun.COM>
Subject: Readings in Machine Translation

Nirenburg, Sergei, Harold Somers and Yorick Wilks, ed. (2003) Readings
in Machine Translation, MIT Press.

Announced at http://linguistlist.org/issues/14/14-1654.html


Bob Kuhns, Sun Microsystems, Inc.

PURPOSE OF BOOK AND OVERVIEW

As the title suggests, ''Readings in Machine Translation'' is a
collection of papers on machine translation (MT), 36 papers to be
exact. The papers were selected by the editors as those they found of
''historical significance'' in the development of MT, yet were
''difficult to find.'' The papers span over 40 years, from the late
1940s to the early 1990s. Recent papers in MT from the last 10 years
have been purposely omitted because the editors admit that it would be
difficult to predict which of these papers will become historically
important.

This book will be of interest to researchers and teachers of MT. For
researchers, the book provides a historical context of many of the
major MT paradigms in existence today as well as articles that support
or question the feasibility of MT. While the book assumes some
familiarity with MT, the book is an excellent set of supplemental
readings for a course in MT at the undergraduate or graduate level.

The book contains 36 articles divided into three sections, namely,
Historical, Theoretical and Methodological Issues, and System Design,
each of which has an introduction. Given the large number of papers,
only the briefest description of each is possible in this review.

Section I - Historical

Sergei Nirenburg introduces the Historical section with a short
overview of how machine translation helped shape the fields of
Artificial Intelligence (AI) and Natural Language Processing
(NLP). Nirenburg notes that the enthusiasm for MT began in the late
1940s and discusses reasons for this enthusiasm. Nirenburg describes
each paper in the first section in varying levels of detail and notes
when there is a shift of perspective or complexity of MT in the papers
which are roughly ordered chronologically.

Chapter 1. Warren Weaver, ''Translation''

Written in the summer of 1949, ''Translation'' was the first
introduction that MT was possible to many of the 200 recipients of
Warren's memorandum. In this memorandum, Warren argues the theoretical
plausibility of MT and suggests a statistical approach for handling
semantic ambiguity as well as the possibility of a ''universal
language'' through which translation from one language to another is
achievable.

Chapter 2. A. D. Booth, ''Mechanical Translation''

Booth describes an approach to MT using available hardware with memory
limitations. He investigates lexical storage and the notion of a
''pre-editor'' to eliminate ambiguity for the source language. The
paper also includes some benchmarks comparing human translation and
MT.

Chapter 3. Erwin Reifler, ''The Mechanical Determination of Meaning''

In this paper, Reifler focuses on meaning as being central to MT and
discusses the similarities and differences of how meaning is treated
by a traditional linguist and a researcher in MT.

Chapter 4. Gilbert W. King, ''Stochastic Methods of Mechanical
Translation''

King introduces the notion of stochastic processing for selecting a
translation of a source term that has competing translations.

Chapter 5. Victor H. Yngve, ''A Framework for Syntactic Analysis''

Yngve, describing work at MIT for overcoming word-for-word translation
problems, provides a high-level system architecture for a
German-to-English MT system that invokes modules for syntactic
analysis for German, a structural transfer component, and an English
generation component.

Chapter 6. Yehoshua Bar-Hillel, ''The Present Status of Automatic
Translation of Languages''

This paper questions fully automatic, high quality translation (FAHQT)
including statistical approaches. Bar-Hillel provides a critical
survey of MT research in the US, United Kingdom, and the USSR. There
are several appendices including one (Appendix III) with the
self-descriptive title of ''A Demonstration of the Nonfeasiblility of
Fully Automatic High Quality Translation.''

Chapter 7. Ida Rhodes, ''A New Approach to the Mechanical Syntactic
Analysis of Russian''

Rhodes describes a syntactic approach to MT in order to overcome the
problems with word-to-word translation. Procedures for lexical,
phrasal and clausal analyses, program control, and generation are
presented for a Russian-to-English system.

Chapter 8. Susumo Kuno, ''A Preliminary Approach to Japanese-English
Automatic Translation''

Kuno reports on procedures for translating Japanese into English. The
procedures of automatic input editing (converting kana and kanji texts
into Roman letters), automatic segmentation, syntactic analysis, and
output editing are described.

Chapter 9. Sydney M. Lamb, ''On the Mechanization of Syntactic Analysis''

Lamb's paper is not one of MT proper but instead describes methods for
syntactic processing based on a analysis of corpora.

Chapter 10. David G. Hays, ''Research Procedures in Machine Translation''

Hays overviews the areas of research that are key for MT, including
morphological analyses, grammar (dependency relations), and semantics
(with a discussion of semantics of source and target word
equivalents).

Chapter 11. John Hutchins, ''ALPAC: The (In)Famous Report''

Hutchins discusses the background, the participants, the implications,
and the misunderstandings of the ALPAC report which, in effect,
stopped major MT funding in the US for twenty years.

Chapter 12. Silvio Ceccato, ''Correlational Analysis and Mechanical
Translation''

Ceccato describes an approach to MT based on semantic relationships
between terms and offers a taxonomic approach to language processing.

Chapter 13. O.S. Kulagina and I.A. Melcuk, ''Automatic Translation:
Some Theoretical Aspects and the Design of a Translation System''

Kulagina and Melcuk examine automatic translation from the broad view
of the determining the text-meaning and meaning-reality relations. The
complexities of relations between meaning and situations are
discussed. The paper also includes a discussion of the major
components of MT systems.

Chapter 14. Margaret Masterman, ''Mechanical Pidgin Translation''

Masterman reports on work at the Cambridge Language Research Unit of
using a pidgin dictionary for overcoming polysemy in languages.

Chapter 15. S. Takahashi, H. Wada, R. Tadenuma, and S. Watanabe,
''English-Japanese Machine Translation''

The paper discusses dictionary structures and translation processes
and flow of a system running on Yamato, a special purpose computer. A
hardware description is included.

Section II - Theoretical and Methodical Issues

Yorick Wilks introduces the second section with an observation that
despite the differences in how MT is approached and the improvements
in NLP components such as part-of-speech taggers and parsers, the
systems on the market do not exploit the theory and improved NLP
technologies. He goes on to enumerate and discuss the crucial issues
for MT in the last 20 years and ties these issues with various papers
in the book.

Chapter 16. John Lehrberger, ''Automatic Translation and the Concept of
Sublanguage''

Lehrberger begins with an analysis (such as categorial distribution
and syntactic and semantic restrictions) of a particular sublanguage
based on aviation maintenance manuals. The paper examines the
feasibility of MT for sublanguages with a description of the
TAUM-METEO system that translates weather reports in Canada from
English to French.

Chapter 17. Martin Kay, ''The Proper Place of Men and Machines in
Language Translation''

Kay summarizes MT issues in terms of linguistic problems and computer
science or engineering obstacles. Kay also critically examines ad hoc
approaches to MT. Kay suggests that an application of machines to
translation should be done in short, reliable steps starting with good
text editors, followed by translation aids, and progressing to MT that
allows for human intervention.

Chapter 18. Roderick L. Johnson and Peter Whitelock, ''Machine
Translation as an Expert Task''

Johnson and Whitelock identify five types of knowledge (source
language knowledge, target language knowledge, text type knowledge,
domain knowledge, and contrastive) that human translators bring to
bear while translating. They describe how different knowledge types
have been an organizing factor in the UMIST English-Japanese MT
system.

Chapter 19. Jan Landsbergen, ''Montague Grammar and Machine Translation''

After a brief introduction to Montague Grammar, Landsbergen describes
a version of Montague Grammar (M-grammar), and then proposes how
M-grammars may be used in Rosetta, an interlingua MT system.

Chapter 20. Jun-ichi Tsujii and Makoto Nagao, ''Dialogue Translation
vs. Text Translation-Interpretation Based Approach''

In contrast to the earlier papers in this book, Tsujii and Nagao's
paper focuses on translation issues involving dialogues and explores
the differences in dialogue and text translation systems. They argue
that MT for dialogues may be more feasible than MT for written texts.

Chapter 21. Ronald M. Kaplan, Klaus Netter, Jurgen Wedekind, and Annie
Zaenen, ''Translation by Structural Correspondences''

Utilizing a Lexical-Functional Grammar framework, the authors describe
a translation method that uses different levels of representation of
structures simultaneously in both the source and target languages.

Chapter 22. Christian Boitet, ''Pros and Cons of the Pivot and
Transfer Approaches in Multilingual Machine Translation''

Boitet's paper looks at the advantages and disadvantages of a pure
pivot (interlingua) approach to MT and even suggests the use of
Esperanto as the pivot language. The paper also explores the transfer
approach and ends with a discussion of the potential future for pivot
and transfer approaches to MT.

Chapter 23. Sergei Nirenburg and Kenneth Goodman, ''Treatment of
Meaning in MT Systems''

Nirenburg and Goodman reflect on how MT methodological issues (say,
interlingua vs. transfer-based and system evaluation) are discussed
and debated. Their view is that debates in the MT community have been
cast so as to mask the true underlying issue, namely, meaning.

Chapter 24. Yorick Wilks, ''Where Am I Coming From: The Reversibility
of Analysis and Generation in Natural Language Processing''

Wilks's paper analyzes and rejects the notion that language analysis
and language generation are symmetrical processes.

Chapter 25. Paul L. Garvin, ''The Place of Heuristics in the Fulcrum
Approach to Machine Translation''

This paper discusses the Fulcrum approach to language, the notion of
heuristics, and provides criteria of how heuristics are applied to
handle certain types of syntactic resolution.

Chapter 26. John S. G. Elliston, ''Computer Aided Translation: A
Business Viewpoint''

Elliston looks at the problems of running a global business and
describes how a company addressed those problems. The paper describes
the company's investigation into controlled languages and in
developing a computer aided translation process.

Section III - System Design

The last section is introduced by Harold Somers who summarizes a
number of design approaches to MT, including AI methods, example-based
machine translation (EBMT), and statistics. Somers ends the
introduction by speculating that speech translation will be a new
subfield of MT which will result in an abundance of valuable research
in the years to come.

Chapter 27. Michael Zarechnak, ''Three Levels of Linguistic Analysis
in Machine Translation''

Zarechnak discusses work at Georgetown University on
Russian-to-English MT based on three levels of analysis, namely,
morphology, syntagmatic, and syntactic.

Chapter 28. B. Vauquois, ''Automatic Translation - A Survey of
Different Approaches''

Vauquois's paper describes first and second generation MT systems, and
is the paper that has the well-known pyramid for picturing the
stratificational view of MT.

Chapter 29. Alan K. Melby, ''Multi-level Translation Aids''

Melby addresses three problems (human factors, the all or nothing
syndrome, and traditional centralized processing) associated with the
Interactive Translation System (ITS).

Chapter 30. Rod Johnson, Maghi King, and Louis des Tombe, ''EUROTA:
Computational Techniques''

This paper discusses the rationale (such as capturing the appropriate
level of detail of linguistic representations) for selecting a
programming language and for making design decisions for MT.

Chapter 31. Makoto Nagao, ''A Framework of a Mechanical Translation
between Japanese and English by Analogy Principle''

This paper presents the core ideas of EBMT and is credited with
introducing the notion of EBMT.

Chapter 32. Peter F. Brown, et al., ''A Statistical Approach to
Machine Translation''

Brown et al. describe the pioneering work conducted at IBM on
statistical MT. Experimental results are included.

Chapter 33. Tsuyoshi Morimoto and Akira Kurematsu, ''Automatic Speech
Translation at ATR''

Morimoto and Kurematsu report on the ATR Interpreting Research project
and present the components and architecture for developing a speech
translation system.

Chapter 34. Yorick Wilks, ''The Stanford Machine Translation Project''

Wilks describes an English-to-French interlingua-based MT system. The
paper includes a detailed discussion of the system and numerous
helpful examples.

Chapter 35. Victor Sadler, ''The Textual Knowledge Bank: Design,
Construction, Applications''

Sadler describes the Textual Knowledge Bank (TKB), a database of full
texts of referentially-coded tree structures, and describes its
potential uses. The notion of TKB is strikingly similar to Nagao's
analogy principle in Chapter 31.

Chapter 36. Harold L. Somers, Jun-ichi Tsujii, and Danny Jones,
''Machine Translation Without a Source Text''

This paper explores a novel application for MT, a type of foreign
language expert that assists a monolingual speaker in understanding a
text in a foreign language.

A CRITICAL EVALUATION

The book is cited as offering ''the most historically significant
English-language articles on MT.'' The editors set out to select
papers that are not only historically significant and often cited in
the field of MT, but that are also difficult to locate. The editors
have done an outstanding job on both accounts. The book is
comprehensive and is representative of most if not all major research
areas and issues related to MT such as interlingua vs. transfer
approaches, controlled input, sublanguages, linguistic issues as they
relate to MT, fully automatic vs. machine-assisted MT, and some
preliminary work on speech translation.

The book is well-organized with the first section containing
historical papers starting with Weaver's 1949 memorandum and ending
with papers in the late 1960s. The papers in this section illustrate
the wide range of thinking and approaches to MT during the early
period from Weaver's enthusiasm for MT (Chapter 1) to Bar-Hillel's
skepticism (Chapter 6). The detailed analyses and complexity of work
on MT in the earliest years are shown as with papers by Reifler
(Chapter 3) and Rhodes (Chapter 7). Overall, the historical section
provides a good sense of the historical foundations of MT.

The second section represents a diversity of theoretical and
methodological issues regarding MT such as sublanguages (Chapter 16),
Montague Grammar (Chapter 19), dialogue translation (Chapter 20), and
a business case study (Chapter 26). This diversity of research areas
and theoretical frameworks is impressive and today's MT researchers
should able to find valuable and relevant references in this section.

The last section on system design shows how the key paradigms of
today's MT have earlier roots. These include data-driven MT such as
statistical approaches (Chapter 32) and EBMT (Chapters 31 and
35). Hints of the potential of AI for translation is seen in Vauquois
1976 paper (Chapter 28) which predates the excitement and boom of AI
in the 1980s. As with the previous section, those currently working in
MT, no matter what the paradigm, will find material on earlier systems
which could result in better design decisions.

Each of the three sections is introduced by one of the authors. These
introductions are most helpful in drawing connections between papers,
while noting some novel ideas at the time as with ontology-based NLP
(Nirenburg's section). Wilks's insightful introduction to the
methodological issues section helps delineate the issues that MT
researchers are trying to address. Somers's summarization of system
design and MT architectures was excellent and his discussion of
independent work of Nagao and Sadler on example-based methods was
noteworthy from a historical perspective.

The only aspect of the book that seems lacking is a summary or
conclusion by the editors. Each of the editors wrote instructive
introductions to the sections of the book, but a summary by the three
prominent MT researchers could have tied together the different
research themes of the book. Maybe more importantly, the conclusion
could have been used to draw historical connections between what has
been accomplished in the first 40 years or so of MT research and what
the editors see as the current trends and research agenda today.

A warning to the reader is that the writing and the style of the
papers vary greatly. Some papers are far more accessible than
others. Nevertheless, the papers are of historical significance and
were quite rightly included.

This book with its scope and breadth of papers and topics is certainly
valuable for anyone in the field of MT research and development and is
highly recommended for anyone who wants an understanding of the
historical foundations of current work in MT.

ABOUT THE REVIEWER

Bob Kuhns is a consultant at Sun Microsystems Laboratories and Sun's
Globalization Portal Group, where he is the NLP architect and manages
projects in translatability assessment, machine translation, and
multilingual terminology management.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue