LINGUIST List 5.116

Wed 02 Feb 1994

Sum: Machine Translation

Editor for this issue: <>


Directory

  1. a mcelligott, Summary of MTS

Message 1: Summary of MTS

Date: Fri, 21 Jan 1994 23:28:25 Summary of MTS
From: a mcelligott <mcelligottaul.ie>
Subject: Summary of MTS


A sincere thanks to all who replied. This following will definitely
get me off the ground.

Cheers,
AMcE.
 =======================================================================

 From: "C.M.Thomson" <tomfiveg.icl.co.uk>

One fairly modern MT system that has been something of a commercial
success is the METAL system from SNI. Some information is given in
 Hammer, C. Parallel Lisp and the Text Translation System METAL
 on the European Declarative System
 in ICL Technical Journal vol 8 iss 4 Nov 1993 pp 641-654
 (Oxford University Press)
 Gajek, O. The METAL System.
 in CACM vol 34 iss 9 Sept 1991
 Thurmair, G. METAL: Computer Integrated Translation
 in Proc of the SALT Workshop 1991, Manchester

I'm sure Carsten has a pile of extra references he could point you
to if you contact him (email: Carsten.Hammerzfe.siemens.de).

There were several attempts at English/Japanese and Japanese/English
MT from the mid 60s onwards, by various Japanese companies and academic
institutions, but these were never successful (even in a restricted field
like comms prococol definition the output was beyond post-processing by
anyone not fluent in the source language and it would take longer to
postprocess than that person could have done translating without the m/c
output) - at least they were really bad at least until the late 70s, they
may have moved on since then. Also there has been work on English/Russian
in the UK, the US, and the USSR, and on English/French and Russian/French
in France and in the USSR, that I know about. So there has been plenty
going on for at least three decades - if I hadn't scrapped all my MT info
when I came to Manchester 8 yrs ago I could have given you references to
much of the early work.

PS: METAL covers German -> English, German -> Spanish,
English -> German, French -> Dutch, Dutch -> French in commercially
available form (ie what you can buy from SNI today) and has other
language pairs in development; it is a real commercial translator
requiring very little human post-processing, not a research toy or
one of those awful products that delivers so many options that the
post-processing reduces length by 50%. So I think it is maybe more
interesting than a lot of other MT systems.

=====================================================================
 From: iatcl.cam.ac.uk
Survey of systems in the market: BYTE 18(1), January 1993

Journals: Machine Translation (Kluwer), Computational Linguistics (MIT).
Books: An Introduction to Machine Translation, Hutchins & Somers, 1992.
=====================================================================

 From: "Caoimhin P. ODonnaile" <caoimhinsmo.ac.uk>

You probably know this already, but Machine Translation is a huge subject,
much more difficult than first imagined, with large corporations currently
spending millions of pounds per year on it, large conferences devoted to
the subject and many books constantly appearing.

There is a database (accesible via the Internet) somewhere in Germany
(the University of Stuttgart, I think) - of computational linguistics
software: parsers, generators, etc. I can look out details if you want.

I am sure that most MT systems cost big money and are not openly
accessible. The main cheaply available software seems to be "Globallink"
for the PC - available for a few hundred pounds at a guess. I haven't seen
it, but I get the impression that the quality of the translation is sometimes
dire, sometimes useful. The only open-access system I have heard of is a
mail-server somewhere in Finland which will return to you, parsed, a
copy of a short English text which you send to it in a mail message.

I think that as far as Gaelic goes, full machine translation is not an
attainable goal in the short to medium term. Better to go for lesser goals,
spell-checkers and the like, which are in any case prerequisites for an MT
system.

One of main prerequisites for any system is a good lexical database,
and so the work which you and Gearo/id O/ Ne/ill are doing on the
Foclo/ir Po/ca at the University of Limerick is admirable. I have had
a copy of the "Learners' Irish-English Dictionary" online for many
years for my own use. I typed it in and verified it myself. It is
copyright and I have no permission, so I can't pass copies to other
people, but if there is any way in which it can be used to assist your
own work without infringing copyright - e.g. for checking a lexical
database constructed from the Foclo/ir Po/ca, I will be very glad to help
in any way I can. Perhaps the Educational Company of Ireland would be
willing to give permission for academic use if approached.

Once you have a lexical database it is very simple to construct wordlists
for spell checking. WordPerfect allows you to construct spell check lexicons
for your own language starting with a wordlist. Version 2 of Word for
Windows does not allow this, but Version 6 which is just out (the version
numbers jumped!) may allow it.

After this there are all sorts of further intermediate goals - online
dictionaries and terminology databases accessible from word processors,
lemmatisers, parsers, thesauruses.

>From what I have read, the large corporations who have attempted MT have
generally found that machine *assisted* human translation, aided by tools
such as those I have mentioned above, is more successful that MT.
Pure MT is not currently possible except in extremely limited semantic
fields (the translation of weather forescasts from English to French in
Canada is the classic example); a large amount of human pre-editing and
post-editing is required to achieve a presentable result.

On a more positive note, even if the quality of MT is still pretty miserable,
even after massive investment, I think that there may be an increasing
market for very poor quality translations. Since most new writing is on
computer anyway, and computers are powerful enough to produce a translation
in a few seconds, many people may feel that even a lousy translation is
better than no translation at all. Imagine if at the press of a key you
could obtain an interlinear tranlation, shown in a different font or colour,
of GAELIC-L messages. I think that many of the American subscribers to
GAELIC-L who know very little Gaelic would be delighted with this. I would
be delighted if such a facility was available for WELSH-L.

Irish Gaelic to Scottish Gaelic translation (or vice-versa) would be
very much easier than Gaelic to English translation, since Irish Gaelic
and Scottish Gaelic are so similar syntactically. (They are "cognate
languages", in the jargon.) It would also be a worthwhile goal since
Irish Gaelic is not intelligible to most speakers of Scottish Gaelic,
and vice-versa. GAELIC-L would be an ideal testing ground for such a
system.
======================================================================

 From: Vasu Renganathan <vasuu.washington.edu>

I am sending you a summary of responses I recently got on MT, from an NLP
newsgroup for South East Asian languages. I hope it helps.

The general consensus seems
that software translation is probably still not very "smart" and itself will
not do the job of an experienced translator fluent in both languages. It could
be an aid to the translator, however, and makes that person`s job a lot easier
and faster.

I would like to thank everyone who responded to me. If you know of any other
vendors who make/market translation software, please let me know via
email. I`d
be glad to update this and send it back out. As a disclaimer, I am not
associated with any of these vendors. The following information is given
as a
public service, use it at your own risks.

1. Translation system by MRJ, Inc., 10455 White Granite Drive - Oakton, VA
22124, 703-385-0830 (voice) - 703-385-4637 (FAX). This is commercial bilingual
English <-> Japanese translation systems, including OCR and MT (Machine
Translation) components.

2. Language Engineering Corporation, product name: LOGOVISTA develops
English-to-Japanese translation software. Tokyo-based software developer
LOGOVISTA has developed a software package which supports the translation of
English-language business letters and technical essays into Japanese.
"LogoVista E to J" translates more rapidly than other packages and the
finished
text requires less rewriting, according to the developers. Versions which run
on SUN, HEWLETT PACKARD, and SONY workstations, as well as APPLE "Macintosh"
computers, will be released in October. NEC "PC-9801" and IBM DOS/V PC
versions will be released next spring. The software, which will be sold
through PC dealer KATENA, is expected to be priced under 200,000 yen ($1,600).
This is an English-to-Japanese translation system called LogoVista E to J.
Macintosh
and Japanese Windows versions are available; both can print to a PostScript
printer. LogoVista E to J includes a main dictionary with over 100,000
entries; this dictionary can be supplemented both by any of our nineteen
technical dictionaries and by user dictionaries that you create.

The following technical dictionaries are available: aerospace engineering,
agricultural science, applied chemistry, applied physics, architecture,
biology, chemistry, civil engineering, earth science, electrical
engineering and electronic communications, general business, general
science and technology, information science, materials science, mechanical
engineering, naval architecture, physics, urban engineering, and zoology.
The technical dictionaries contain a total of over 415,000 terms.

The Macintosh version of LogoVista E to J requires either KanjiTalk 7.1 or
US System 7.1 and the Japanese Language Kit. The Windows version requires
DOS/V 5.0 or later and Japanese Windows 3.1. Both versions require at
least 6MB of RAM and 30MB of hard disk space. The price of the basic
system (with the 100,000-entry dictionary) is $1,995. The four largest
technical dictionaries (general business, general science and technology,
electrical engineering and electronic communications, and mechanical
engineering) cost $995 each. The other fifteeen technical dictionaries
cost $495 each. Call John Richards (johnrlec.com), (617) 489-4000, ext. 727
for more information.

3. IBM JAPAN has developed and released for sale a translation
support software which simultaneously displays the source text and
the in-process translation on the same screen, showing synonyms and
dictionary definitions in separate windows. The new "Translation
Manager/2," the first translation support tool of its type, makes it
possible to share the same data on two different PCs and boasts other
features which double productivity compared to manual translation,
according to IBM JAPAN. The price is 787,500 yen ($7,429).

4. Someone mentioned Duet from JustSystem (The company who made Ichitaro).
"But
as
far as I know, it works with only PC9801 series of NEC, a DOS machine
but not quite IBM-PC/AT and it's really dumb.
 And there are "The Translator" and "Logovista" from Katena. These
guys are for Macintoshes (Logovista is available also for Windows, I
think) and singnificantly smarter, especially Logovista which can
handle nested clause such as "I don't think you think your boss thinks
computers can think".
 Remember, though, that machine translation is stil at primitive
level. It's just as smart as cpp (perhaps a little smarter). And you
need to make a lot of investment besides money for software and
hardware to cultivate your own set of dictionary for your own need
(the reason Duet is still strong is this: Many companies have spend
singnifican manhours to grow dictionary). And even with that, that
will not irradicate the need for human translators. It helps
professionals a lot by preparing a draft but it's no good for people
who doesn't know English and Japanese at all....
====================================================================

 From: mbmmtl.mit.edu

For information on MIT efforts ask Robert Berwick (berwickai.mit.edu).
====================================================================
 From: Francis Bond <bondnttkb.ntt.jp>

I am working on a Japanese-English MT system. I would be happy to
send you a copy of our demonstration pamphlett, which gives a brief
description of the system and lists further references. If you can
print Japanese characters I can send you the .ps file, or a LaTeX or
DVI file. If not I can send you a hard copy snail mail. Which would
you prefer?
=====================================================================

 From: "J.HUTCHINS" <L101CPCMB.EAST-ANGLIA.AC.UK>

There is in fact a vast literature
on the subject, there are numerous commercially available MT
systems, and many MT projects, involving a very wide range of
languages and different approaches.

As introductions to the subject I would suggest my own books:
 Hutchins, W.J. (1986) Machine translation: past, present,
future. [A history of MT research and systems up to 1984.]
 Hutchins, W.J. and Somers, H.L. (1992) An introduction
to machine translation (Academic Press) [An introductory textbook
for masters and Ph.D students, covering the basic approaches and
details of 'typical' systems.]

Books by others include:
 Arnold, D. et al. (1994) Machine translation: an introductory
guide (Blackwell). [Just published basic introduction for non-linguists
and translators]
 Newton, John (ed.) (1992) Computers and translation (Routledge)
[A collection of introductory papers coverin a wide range of MT topics.]
Slocum, John (ed.) (1988) Machine translation systems (CUP) [A
collection of papers on the major MT systems.]

Then there are the proceedings of conferences:
 MT Summit Conferences, held in 1987, 1989, 1991, and 1993
 Theoretical and Methodological Issues in MT, held in 1985, 1988,
1990, 1992, and 1993.
 Coling conferences in recent years have contained many MT papers.

For keeping up to date there is the newsletter of the International
Association for Machine Translation, entitled MT News International.
This is available free to all members of IAMT and its regional associations,
e.g. the European Association for Machine Translation.

=====================================================================
 From: Patrick Jost <jostitd.nrl.navy.mil>

Two books I'd recommend are by Nirenburg (Machine Translation, Camb. U.
Press) and Carbonnel et. al. (Machine Translation, a Knowledge Based
Approach, Morgan Kaufman). John Hutchinson's book from Academic Press
is supposedly quite good as well, but I have been unable to get a
complete copy, there were printing production problems.

There's a very interesting MT project called Pangloss going on at ISI...
contact Ed Hovy (hovyisi.edu) for details.

There are really two approaches...going directly from language A to
language B, this is "transfer" MT and using an "interlingua" so you
go from language A to the IL and then to language B.

Commerical systems...the leader is Systran, in La Jolla, CA. You can
caontact them on 619-459-6700. Siemens is just getting ready to
release their "METAL" system, I am waiting for sample translations.
=====================================================================

 From: Walter van den Heever <WVANDENHdos-lan.cs.up.ac.za>

We (Unit for Software Engineering in collaboration with the University of
Pretoria) are developing a MT system. The project is currently in its 5'th
year and a commercial system (Lexica) is presently being sold to select
clients.

Lexica is a syntactic transfer system, presently being extended to
incorporate semantic information (basically still 80's technology).

The languages attempted include both European and African languages (such
as English, Afrikaans, French, Swahili, Tswana, Zulu).

Based on my experience so far, my impressions are as follows:
* I don't think that FAHQT of unrestricted text is possible,
* MT can offer useful results in restricted domains (such as technical
texts)
* Users don't understand the complexity involved and often try to use
the system outside its limitations,
* The translation between European languages is much simpler than
translation between European and African languages. Similar observations
have been made concerning the translation between European and Asian
languages. This is due to differences in culture and the way these
languages work.
* A problem we have (similar problems may or may not exist elsewhere) is
to get the right people for the job. Linguists have to undergo
considerable training before they are able to write a grammar suitable for
computation. Computer Scientists can do that, but don't really have the
necessary language skills.
* The quality of MT depends greatly on the input. The old Garbage-In-
Garbage-Out saying contains an element of truth in the case of MT. We have
analysed text that didn't translate well and found that even we were not
able to understand exactly what the author meant. After rewritting the
text more plainly the translation improved considerably and we understood
the original better.
* The building of dictionaries is i) time-consuming, ii) costly and iii)
error-prone.
* In order to do translation in anything more that a toy-domain, one
requires dictionaries in the order of 50 000 words.
These are some very general (and by no means original) observations.
======================================================================
 From: Gaelle.Recourcelinguist.jussieu.fr

 Your question in Linguist involves a huge area: here is a short and
partial answer. Many European projects were devoted to MT in official
community languages. I took part in the EUROTRA research project, which was
the biggest one. Its main quality was to provide at the end (december1992)
a good summary of the linguistic specifications needed to build an MT
system. You can get them in asking to the EC a version of the so-called
EUROTRA Reference Manual. If you are really interested, don't hesitate to
contact me to have more information. Note that the software itself is
obsolete and of no interest, but that all the specifications were actually
implemented in the nine languages. At last, you should know that several
smaller projects carry on now with which you could get in contact
(EUROLANG, ET-10 projects,).
=====================================================================
 From: Meyer S <meyesessex.ac.uk>

Firstly, here is a brief description of some MT systems that you might
be interested in:

$\bullet$ METAL, one of the most advanced operational systems
(transfer based, making use of deep linguistic analysis) which has
been developed by Siemens, Germany. You may find it easier to contact
Siemens here in Britain: Siemens Group Services Limited, 83 Guildford
Street, Chertsey, Surrey KT16 9AS (Tel: 0932 566791).

$\bullet$ The Globalink Translation System (GTS) could be classified
as a `direct' system. The quality may not be as high as some of the
other systems mentioned, but it is cheap and fast. It has several
British distributors, but unfortunately we only have their American
address: Globalink Inc., 9302 Lee Highway, Fairfax, Virginia 22031,
USA (Tel: 703 273-5600).

$\bullet$ The Tovna Machine Translation System (Tovna MTS) is a
transfer based system that `learns' from previous input. The UK
address is: Tovna Translation Machines Ltd., EUROSOFT (UK) Ltd.,
Cottons Centre, Cottons Lane, Tooley St., LONDON SE1 2QL (Tel: 234
6635).

$\bullet$ Systran is an amended version of what they call
a `direct' translation system, which only performs a shallow analysis
of the input. The main distributor of Systran is the Gachot company in
Soisy-sous-Montmorency (near Paris), France. A new English company is
negotiating the right to distribute Systran in Britain. The main user
of Systran in Britain is: Rank Xerox Ltd., Parkway, Marlow, Bucks SL7
1YL (Tel: 0628 890000).

$\bullet$ The Logos system is (as far as we know) a transfer based system
that makes use of a deeper linguistic analysis of input.
The address of Logos is:
Logos Corporation, 45 Park Place So, Suite 214, Morristown, NJ 07960,
USA. We do not know of an English distributor, nor of any main users.

$\bullet$ Weidner's MicroCAT is an interactive system.
The European subsidiary of Weidner is: WTE (Weidner Translation
(Europe) Limited), Fryern House, 125 Winchester Road, Chandler's Ford,
Eastleigh, SO5 2DR. One of the main users of Weidner's MicroCAT is
Perkins Engines, Peterborough.

$\bullet$ DLT is an interlingual system which
uses an interlingua based on Esperanto as a `bridge' between
languages. This package is
developed by the Utrecht software company: Buro voor
Systeemontwikkeling (BSO), The Netherlands.

Secondly, the following books may be of interest:
``Machine Translation -- An Introductory Guide'',
by Siety Meijer, Lorna Balkan, Doug Arnold, Louisa Sadler and R Lee
Humphreys. NCC Blackwell.

Machine Translation,
John Hutchins and Harold Sommers.
(also discusses non-commercial systems)
======================================================================
 From: Niek van der Donk <N.J.M.vdrDonkkub.nl>

Machine translation : a view from the lexicon / Bonnie Jean Dorr. - Cambridge,
Mass [etc.] : MIT Press, cop. 1993. - XX, 432 p. : ill. ; 24 cm. - (Artificial
intelligence)

Linguistic issues in machine translation / edited by Frank Van Eynde. - London
[etc.] : Pinter, 1993. - viii, 239 p. : ill. ; 24 cm. - (Communication in
artificial intelligence series)

Progress in machine translation / ed. by Sergei Nirenburg. - Amsterdam
[etc.] :
IOS Press ; Tokyo [etc.] : Ohmsha, 1993. - X, 320 p. : ill. ; 24 cm
Lit. opg.: p. [297]-318. - Index.
I

Machine translation : a knowledge-based approach / Sergei Nirenburg ... [et
al.]. - San Mateo, Cal.: Morgan Kaufmann, cop. 1992. - XIV, 258 p. : ill.
; 24
cm

An introduction to machine translation / W.John Hutchins and Harold L. Somers.
- London [etc.] : Academic Press, 1992. - XXI, 362 p. : fig. ; 25 cm
Bibliogr.: p. 335-350. - Index.

Towards high-precision machine translation : based on contrastive
textology /
John Laffling. - Berlin [etc.] : Foris, 1991. - VII, 178 p. : ill. ; 25
cm. -
(Distributed language translation ; 7)

Machine translation summit / editor-in-Chief M. Nagao ; editors H. Tanaka ...
[et al.]. - Tokyo : Ohmsha, cop. 1989. - XIV, 224 p. : ill. ; 27 cm
Proceedings of the three-day Machine Translation Summit held at Japan's Hakone
Prince Hotel from September 16, 1987

Machine translation : how far can it go? / Makoto Nagao ; transl. by
Norman D.
Cook. - Oxford [etc.] : Oxford University Press, 1989. - xii, 150 p. :
ill. ;
23 cm

New directions in machine translation : conference proceedings, Budapest 18-19
Augustus, 1988 / Dan Maxwell, Klaus Schubert, Toon Witkam (eds.). - Dordrecht
[etc.] : Foris, 1988. - IV, 259 p. ; 24 cm. - (Distributed language
translation
; 4)

=====================================================================
 From: caffreyMIT.EDU

Do a litterature search for JONATHAN SLOCUM who has done reviews of MT
systems. Also write to the Centre for Machine translation at Carnegie
Mellon U. in Pittsburgh.
======================================================================
 From: Eduard Hovy <hovyISI.EDU>

Oi, this is a big question, more than I have time or patience to answer.
I suggest you read the following, in order:

- BYTE magazine, January 1993, special issue on MT, 3 main articles.
- Machine Translation, John Hutchins, approx. 1985.
- Computational Linguistics special issue on MT, 11(1 and 2-3), 1986.

Then please ask again about the types of systems you're interested in.
=======================================================================
 From: R Chandrasekar <mickeysaathi.ncst.ernet.in>

I work in Machine Translation (MT). In my PhD thesis,
I am arguing that one should try to use all sorts
of methods (including heuristic simplification)
to attack the formidable problems of MT. I work
at and R&D Centre in Bombay, where we are looking
at translation from English to Hindi. BTW, I spent
some time as a visiting researcher at the
Center for Machine Translation at Carnegie-Mellon Univ,
Pittsburgh, USA. Do you know about this Center?

If you are interested, I could send you a list of
books on Machine Translation. If you want to know
some place in the UK where there is considerable
MT activity, try contacting:

 Dr Harold L Somers
 Centre for Computational Linguistics
 UMIST
 PO Box 88,
 Manchester UK

 Email: haroldccl.umist.ac.uk
======================================================================
 From: Dan Maxwell <100101.2276CompuServe.COM>

In response to your request for information, there are several books which
survey several projects. One of these is by Hutchins, W.J. 1986, "Machine
Translation, Present, Past and Future", Chichester:Ellis Horwood. Another
is a more recent one (about 1989) by Jonathan Slocum, I believe, but I
don't know the title. There is a series of six books about the DLT
(Distributed Language Translation) project, of which I was a part,
published by Foris publications, Dordrecht, NL. One of these, "New
Directions in Machine Translation" is actually the articles from a
conference on MT organized by the company sponsoring DLT. It covers
various topics and projects within MT, including an update of Hutchins'
book. Hutchins' work in particular shows that there are/have been quite a
lot of projects, but I have the impression that most of them have rather
little published work written about them. And a lot of the articles that
I have seen are oriented more toward the computational side of MT rather
than the linguistic side. I recommend Hutchins' book as a starter and
then particularly #5 of the DLT series, "Working with analogical
semantics", by Victor Sadler. It was one of the first treatments of
corpus-based approaches, which now seem to be widely used, judging from
recent issues of "Computational Linguistics".
=====================================================================
 From: Merrill=Kashiwabara%HQ%RationalVines1.ratsys.com

I read your request for information on MT systems, but have very little to
offer you except a few companies which we looked into as part of our software
localization efforts. The companies with the longest records seem to be
SYSTRAN, which is a descendant of the old DARPA machine translation efforts.
They have remote facility which allows you to send text and certain types of
formatted information over the wire to their facility for translation and re-
transmission back to the client. Their translation engines seem to be hand-
crafted pragmatically-oriented rather than based on a particular theory or
philosophy. Their heuristics are empirically derived. I don't have a contact
at Systran, but since they've been around since the '60's, I think that that
information is probably readily available.

Another machine translation system is the PC-based Global-link software
product suite which has a limited vocabulary and subject base and covers 5
major European Languages. The engine seems to be an exception-based lookup
table(s). We had a lot of fun translating to and from several languages,
with sometimes bizarre results.

We examined several products, and I have the literature in hardcopy somewhre,
but I'd have to dig it out of the high entropy field which surrounds my desk,
so it might take a couple of days. Are you interested in finding an MT
system, or in a general survey of the players an the existing techniques
being used?


__________________________________________________________________

 Annette McElligott, CSIS Dept., University of Limerick, Ireland.
 Tel: +353 61 333644 ext. 5024; Fax: +353 61 330876
 Email: mcelligoitdsrv1.ul.ie or mcelligottaul.ie
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue