LINGUIST List 14.2879

Tue Oct 21 2003

Review: Comp Ling: Nirenburg, Somers & Wilks (2003)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>


What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at siminlinguistlist.org.


Directory

  • Bob Kuhns - Sun Microsystems Labs BOS, Readings in Machine Translation

    Message 1: Readings in Machine Translation

    Date: Tue, 21 Oct 2003 17:17:38 +0000
    From: Bob Kuhns - Sun Microsystems Labs BOS <Robert.KuhnsSun.COM>
    Subject: Readings in Machine Translation


    Nirenburg, Sergei, Harold Somers and Yorick Wilks, ed. (2003) Readings in Machine Translation, MIT Press.

    Announced at http://linguistlist.org/issues/14/14-1654.html

    Bob Kuhns, Sun Microsystems, Inc.

    PURPOSE OF BOOK AND OVERVIEW

    As the title suggests, ''Readings in Machine Translation'' is a collection of papers on machine translation (MT), 36 papers to be exact. The papers were selected by the editors as those they found of ''historical significance'' in the development of MT, yet were ''difficult to find.'' The papers span over 40 years, from the late 1940s to the early 1990s. Recent papers in MT from the last 10 years have been purposely omitted because the editors admit that it would be difficult to predict which of these papers will become historically important.

    This book will be of interest to researchers and teachers of MT. For researchers, the book provides a historical context of many of the major MT paradigms in existence today as well as articles that support or question the feasibility of MT. While the book assumes some familiarity with MT, the book is an excellent set of supplemental readings for a course in MT at the undergraduate or graduate level.

    The book contains 36 articles divided into three sections, namely, Historical, Theoretical and Methodological Issues, and System Design, each of which has an introduction. Given the large number of papers, only the briefest description of each is possible in this review.

    Section I - Historical

    Sergei Nirenburg introduces the Historical section with a short overview of how machine translation helped shape the fields of Artificial Intelligence (AI) and Natural Language Processing (NLP). Nirenburg notes that the enthusiasm for MT began in the late 1940s and discusses reasons for this enthusiasm. Nirenburg describes each paper in the first section in varying levels of detail and notes when there is a shift of perspective or complexity of MT in the papers which are roughly ordered chronologically.

    Chapter 1. Warren Weaver, ''Translation''

    Written in the summer of 1949, ''Translation'' was the first introduction that MT was possible to many of the 200 recipients of Warren's memorandum. In this memorandum, Warren argues the theoretical plausibility of MT and suggests a statistical approach for handling semantic ambiguity as well as the possibility of a ''universal language'' through which translation from one language to another is achievable.

    Chapter 2. A. D. Booth, ''Mechanical Translation''

    Booth describes an approach to MT using available hardware with memory limitations. He investigates lexical storage and the notion of a ''pre-editor'' to eliminate ambiguity for the source language. The paper also includes some benchmarks comparing human translation and MT.

    Chapter 3. Erwin Reifler, ''The Mechanical Determination of Meaning''

    In this paper, Reifler focuses on meaning as being central to MT and discusses the similarities and differences of how meaning is treated by a traditional linguist and a researcher in MT.

    Chapter 4. Gilbert W. King, ''Stochastic Methods of Mechanical Translation''

    King introduces the notion of stochastic processing for selecting a translation of a source term that has competing translations.

    Chapter 5. Victor H. Yngve, ''A Framework for Syntactic Analysis''

    Yngve, describing work at MIT for overcoming word-for-word translation problems, provides a high-level system architecture for a German-to-English MT system that invokes modules for syntactic analysis for German, a structural transfer component, and an English generation component.

    Chapter 6. Yehoshua Bar-Hillel, ''The Present Status of Automatic Translation of Languages''

    This paper questions fully automatic, high quality translation (FAHQT) including statistical approaches. Bar-Hillel provides a critical survey of MT research in the US, United Kingdom, and the USSR. There are several appendices including one (Appendix III) with the self-descriptive title of ''A Demonstration of the Nonfeasiblility of Fully Automatic High Quality Translation.''

    Chapter 7. Ida Rhodes, ''A New Approach to the Mechanical Syntactic Analysis of Russian''

    Rhodes describes a syntactic approach to MT in order to overcome the problems with word-to-word translation. Procedures for lexical, phrasal and clausal analyses, program control, and generation are presented for a Russian-to-English system.

    Chapter 8. Susumo Kuno, ''A Preliminary Approach to Japanese-English Automatic Translation''

    Kuno reports on procedures for translating Japanese into English. The procedures of automatic input editing (converting kana and kanji texts into Roman letters), automatic segmentation, syntactic analysis, and output editing are described.

    Chapter 9. Sydney M. Lamb, ''On the Mechanization of Syntactic Analysis''

    Lamb's paper is not one of MT proper but instead describes methods for syntactic processing based on a analysis of corpora.

    Chapter 10. David G. Hays, ''Research Procedures in Machine Translation''

    Hays overviews the areas of research that are key for MT, including morphological analyses, grammar (dependency relations), and semantics (with a discussion of semantics of source and target word equivalents).

    Chapter 11. John Hutchins, ''ALPAC: The (In)Famous Report''

    Hutchins discusses the background, the participants, the implications, and the misunderstandings of the ALPAC report which, in effect, stopped major MT funding in the US for twenty years.

    Chapter 12. Silvio Ceccato, ''Correlational Analysis and Mechanical Translation''

    Ceccato describes an approach to MT based on semantic relationships between terms and offers a taxonomic approach to language processing.

    Chapter 13. O.S. Kulagina and I.A. Melcuk, ''Automatic Translation: Some Theoretical Aspects and the Design of a Translation System''

    Kulagina and Melcuk examine automatic translation from the broad view of the determining the text-meaning and meaning-reality relations. The complexities of relations between meaning and situations are discussed. The paper also includes a discussion of the major components of MT systems.

    Chapter 14. Margaret Masterman, ''Mechanical Pidgin Translation''

    Masterman reports on work at the Cambridge Language Research Unit of using a pidgin dictionary for overcoming polysemy in languages.

    Chapter 15. S. Takahashi, H. Wada, R. Tadenuma, and S. Watanabe, ''English-Japanese Machine Translation''

    The paper discusses dictionary structures and translation processes and flow of a system running on Yamato, a special purpose computer. A hardware description is included.

    Section II - Theoretical and Methodical Issues

    Yorick Wilks introduces the second section with an observation that despite the differences in how MT is approached and the improvements in NLP components such as part-of-speech taggers and parsers, the systems on the market do not exploit the theory and improved NLP technologies. He goes on to enumerate and discuss the crucial issues for MT in the last 20 years and ties these issues with various papers in the book.

    Chapter 16. John Lehrberger, ''Automatic Translation and the Concept of Sublanguage''

    Lehrberger begins with an analysis (such as categorial distribution and syntactic and semantic restrictions) of a particular sublanguage based on aviation maintenance manuals. The paper examines the feasibility of MT for sublanguages with a description of the TAUM-METEO system that translates weather reports in Canada from English to French.

    Chapter 17. Martin Kay, ''The Proper Place of Men and Machines in Language Translation''

    Kay summarizes MT issues in terms of linguistic problems and computer science or engineering obstacles. Kay also critically examines ad hoc approaches to MT. Kay suggests that an application of machines to translation should be done in short, reliable steps starting with good text editors, followed by translation aids, and progressing to MT that allows for human intervention.

    Chapter 18. Roderick L. Johnson and Peter Whitelock, ''Machine Translation as an Expert Task''

    Johnson and Whitelock identify five types of knowledge (source language knowledge, target language knowledge, text type knowledge, domain knowledge, and contrastive) that human translators bring to bear while translating. They describe how different knowledge types have been an organizing factor in the UMIST English-Japanese MT system.

    Chapter 19. Jan Landsbergen, ''Montague Grammar and Machine Translation''

    After a brief introduction to Montague Grammar, Landsbergen describes a version of Montague Grammar (M-grammar), and then proposes how M-grammars may be used in Rosetta, an interlingua MT system.

    Chapter 20. Jun-ichi Tsujii and Makoto Nagao, ''Dialogue Translation vs. Text Translation-Interpretation Based Approach''

    In contrast to the earlier papers in this book, Tsujii and Nagao's paper focuses on translation issues involving dialogues and explores the differences in dialogue and text translation systems. They argue that MT for dialogues may be more feasible than MT for written texts.

    Chapter 21. Ronald M. Kaplan, Klaus Netter, Jurgen Wedekind, and Annie Zaenen, ''Translation by Structural Correspondences''

    Utilizing a Lexical-Functional Grammar framework, the authors describe a translation method that uses different levels of representation of structures simultaneously in both the source and target languages.

    Chapter 22. Christian Boitet, ''Pros and Cons of the Pivot and Transfer Approaches in Multilingual Machine Translation''

    Boitet's paper looks at the advantages and disadvantages of a pure pivot (interlingua) approach to MT and even suggests the use of Esperanto as the pivot language. The paper also explores the transfer approach and ends with a discussion of the potential future for pivot and transfer approaches to MT.

    Chapter 23. Sergei Nirenburg and Kenneth Goodman, ''Treatment of Meaning in MT Systems''

    Nirenburg and Goodman reflect on how MT methodological issues (say, interlingua vs. transfer-based and system evaluation) are discussed and debated. Their view is that debates in the MT community have been cast so as to mask the true underlying issue, namely, meaning.

    Chapter 24. Yorick Wilks, ''Where Am I Coming From: The Reversibility of Analysis and Generation in Natural Language Processing''

    Wilks's paper analyzes and rejects the notion that language analysis and language generation are symmetrical processes.

    Chapter 25. Paul L. Garvin, ''The Place of Heuristics in the Fulcrum Approach to Machine Translation''

    This paper discusses the Fulcrum approach to language, the notion of heuristics, and provides criteria of how heuristics are applied to handle certain types of syntactic resolution.

    Chapter 26. John S. G. Elliston, ''Computer Aided Translation: A Business Viewpoint''

    Elliston looks at the problems of running a global business and describes how a company addressed those problems. The paper describes the company's investigation into controlled languages and in developing a computer aided translation process.

    Section III - System Design

    The last section is introduced by Harold Somers who summarizes a number of design approaches to MT, including AI methods, example-based machine translation (EBMT), and statistics. Somers ends the introduction by speculating that speech translation will be a new subfield of MT which will result in an abundance of valuable research in the years to come.

    Chapter 27. Michael Zarechnak, ''Three Levels of Linguistic Analysis in Machine Translation''

    Zarechnak discusses work at Georgetown University on Russian-to-English MT based on three levels of analysis, namely, morphology, syntagmatic, and syntactic.

    Chapter 28. B. Vauquois, ''Automatic Translation - A Survey of Different Approaches''

    Vauquois's paper describes first and second generation MT systems, and is the paper that has the well-known pyramid for picturing the stratificational view of MT.

    Chapter 29. Alan K. Melby, ''Multi-level Translation Aids''

    Melby addresses three problems (human factors, the all or nothing syndrome, and traditional centralized processing) associated with the Interactive Translation System (ITS).

    Chapter 30. Rod Johnson, Maghi King, and Louis des Tombe, ''EUROTA: Computational Techniques''

    This paper discusses the rationale (such as capturing the appropriate level of detail of linguistic representations) for selecting a programming language and for making design decisions for MT.

    Chapter 31. Makoto Nagao, ''A Framework of a Mechanical Translation between Japanese and English by Analogy Principle''

    This paper presents the core ideas of EBMT and is credited with introducing the notion of EBMT.

    Chapter 32. Peter F. Brown, et al., ''A Statistical Approach to Machine Translation''

    Brown et al. describe the pioneering work conducted at IBM on statistical MT. Experimental results are included.

    Chapter 33. Tsuyoshi Morimoto and Akira Kurematsu, ''Automatic Speech Translation at ATR''

    Morimoto and Kurematsu report on the ATR Interpreting Research project and present the components and architecture for developing a speech translation system.

    Chapter 34. Yorick Wilks, ''The Stanford Machine Translation Project''

    Wilks describes an English-to-French interlingua-based MT system. The paper includes a detailed discussion of the system and numerous helpful examples.

    Chapter 35. Victor Sadler, ''The Textual Knowledge Bank: Design, Construction, Applications''

    Sadler describes the Textual Knowledge Bank (TKB), a database of full texts of referentially-coded tree structures, and describes its potential uses. The notion of TKB is strikingly similar to Nagao's analogy principle in Chapter 31.

    Chapter 36. Harold L. Somers, Jun-ichi Tsujii, and Danny Jones, ''Machine Translation Without a Source Text''

    This paper explores a novel application for MT, a type of foreign language expert that assists a monolingual speaker in understanding a text in a foreign language.

    A CRITICAL EVALUATION

    The book is cited as offering ''the most historically significant English-language articles on MT.'' The editors set out to select papers that are not only historically significant and often cited in the field of MT, but that are also difficult to locate. The editors have done an outstanding job on both accounts. The book is comprehensive and is representative of most if not all major research areas and issues related to MT such as interlingua vs. transfer approaches, controlled input, sublanguages, linguistic issues as they relate to MT, fully automatic vs. machine-assisted MT, and some preliminary work on speech translation.

    The book is well-organized with the first section containing historical papers starting with Weaver's 1949 memorandum and ending with papers in the late 1960s. The papers in this section illustrate the wide range of thinking and approaches to MT during the early period from Weaver's enthusiasm for MT (Chapter 1) to Bar-Hillel's skepticism (Chapter 6). The detailed analyses and complexity of work on MT in the earliest years are shown as with papers by Reifler (Chapter 3) and Rhodes (Chapter 7). Overall, the historical section provides a good sense of the historical foundations of MT.

    The second section represents a diversity of theoretical and methodological issues regarding MT such as sublanguages (Chapter 16), Montague Grammar (Chapter 19), dialogue translation (Chapter 20), and a business case study (Chapter 26). This diversity of research areas and theoretical frameworks is impressive and today's MT researchers should able to find valuable and relevant references in this section.

    The last section on system design shows how the key paradigms of today's MT have earlier roots. These include data-driven MT such as statistical approaches (Chapter 32) and EBMT (Chapters 31 and 35). Hints of the potential of AI for translation is seen in Vauquois 1976 paper (Chapter 28) which predates the excitement and boom of AI in the 1980s. As with the previous section, those currently working in MT, no matter what the paradigm, will find material on earlier systems which could result in better design decisions.

    Each of the three sections is introduced by one of the authors. These introductions are most helpful in drawing connections between papers, while noting some novel ideas at the time as with ontology-based NLP (Nirenburg's section). Wilks's insightful introduction to the methodological issues section helps delineate the issues that MT researchers are trying to address. Somers's summarization of system design and MT architectures was excellent and his discussion of independent work of Nagao and Sadler on example-based methods was noteworthy from a historical perspective.

    The only aspect of the book that seems lacking is a summary or conclusion by the editors. Each of the editors wrote instructive introductions to the sections of the book, but a summary by the three prominent MT researchers could have tied together the different research themes of the book. Maybe more importantly, the conclusion could have been used to draw historical connections between what has been accomplished in the first 40 years or so of MT research and what the editors see as the current trends and research agenda today.

    A warning to the reader is that the writing and the style of the papers vary greatly. Some papers are far more accessible than others. Nevertheless, the papers are of historical significance and were quite rightly included.

    This book with its scope and breadth of papers and topics is certainly valuable for anyone in the field of MT research and development and is highly recommended for anyone who wants an understanding of the historical foundations of current work in MT.

    ABOUT THE REVIEWER

    Bob Kuhns is a consultant at Sun Microsystems Laboratories and Sun's Globalization Portal Group, where he is the NLP architect and manages projects in translatability assessment, machine translation, and multilingual terminology management.