LINGUIST List 9.1128

Mon Aug 10 1998

Calls: Bilingualism, MT Systems

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Please do not use abbreviations or acronyms for your conference unless you explain them in your text. Many people outside your area of specialization will not recognize them. Also, if you are posting a second call for the same event, please keep the message short. Thank you for your cooperation.

Directory

  1. Li Wei, 2nd Intl Symposium on Bilingualism
  2. Lisa D. Harper, Workshop on Embedded MT Systems (Call for Papers)

Message 1: 2nd Intl Symposium on Bilingualism

Date: Mon, 10 Aug 1998 09:44:40 GMT0BST
From: Li Wei <Li.Weinewcastle.ac.uk>
Subject: 2nd Intl Symposium on Bilingualism

REMINDER REMINDER REMINDER REMINDER

2nd International Symposium on Bilingualism
(April, 1999, Newcastle, UK)

Details of the symposium and registration form now available at
http://www.newcastle.ac.uk/~nspeech

Deadline for submission of abstract: 31 August, 1998.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Workshop on Embedded MT Systems (Call for Papers)

Date: Mon, 10 Aug 1998 07:28:11 -0400
From: Lisa D. Harper <lisahmitre.org>
Subject: Workshop on Embedded MT Systems (Call for Papers)

********* DEADLINE EXTENSION ANNOUNCEMENT *********
********* NEW SUBMISSION DEADLINE: August 24, 1998

WORKSHOP ANNOUNCEMENT
- -------------------


 WORKSHOP ON EMBEDDED MT SYSTEMS
 CALL FOR PAPERS

Design, Construction, and Evaluation of Systems with an MT Component

Wednesday, October 28, 1998 (preceding the AMTA 98 conference)
Sheraton Bucks County Hotel, Langhorne, Pennsylvania


Introduction

As the strengths and weaknesses of machine translation (MT) engines
have become better understood and accepted, there has been a marked
increase in the development of computer systems with an embedded MT
component. One consequence of this shift to "embedded MT" is that
researchers, developers, as well as users have begun pushing the
limits on the input that such systems will accept for translation. In
so doing, a new class of problems has surfaced: any input---whether it
appears in physical form on paper, in electronic form on-line, or
mixed in with another modality such as graphics or video---will bring
with it some unknown mix of noisy natural language data as well as
non-linguistic data. How are systems with an MT component to be
designed and evaluated given the challenge this input brings?

The objective of this workshop is to examine and evaluate techniques
for adjusting this "linguistic impedance mismatch" between the
real-world input and the natural language input expected by various MT
engines. Thus the workshop will focus on computational approaches to
preprocessing system input for MT engines and on statistical methods
for evaluating systems with an embedded MT component.

Linguistic Preprocessing In Image Data

For researchers working with image data, there is currently underway
an effort to augment OCR (optical character recognition) engines with
linguistic data as they recognize and convert bitmap data into
characters---similar to what has already been done in speech
recognition with linguistic data in HMMs (hidden Markov models).
Other OCR researchers have also experimented with image-level early
topic detection using word-shape recognition. In principle, this could
provide a first-step filtering of documents into a more homogeneous MT
input set, a desirable goal for MT evaluation. Thus we expect that
individuals working with or intending to incorporate OCR into their
computer systems will be interested in this new area.

Linguistic Preprocessing in Online Data

For those working with online input, even though the characters are
already present, there often still remains the task of preprocessing
meaningful, symbolic character strings that are not a part of the text
to be translated. For some systems, the rules for identifying and
encapsulating or removing such strings may need to be hand-crafted
over time as MT engine limitations surface. For others, a combination
of hand-crafted rules and statistically trained NL models has worked.
Many have observed that the HTML annotations, alphanumeric items,
spreadsheet and word processing codes are harder to weed out than
originally expected.

Research efforts with the low-density and less-commonly taught
languages, as well as more common ones, encounter a substantial
problem with variation in spelling conventions and transcription
preferences. For those natural languages that are primarily spoken
and not written, for example, this is frequently the case. Researchers
working on this class of problem have built variants on spell checkers
(SC), components that standardize words to one orthography (spelling
convention) before submitting it to an MT engine. An idea that has
arisen for this component is to build in an option to adjust the level
of SC correction---as would be relevant when input after OCR
nonetheless varies from very noisy to relatively clean.

Evaluation of Embedded MT Systems

Among those working on statistical methods for evaluating systems with
an embedded MT component, we have seen two distinct trends. One group
of statisticians has begun looking for appropriate models from outside
the world of MT evaluation, examining the efforts by others to take
distinct metrics for components and combine them for an overall
system-level metric using fuzzy mathematics. Another group of
researchers is looking instead at developing a one-dimensional scale
for ranking MT engines along a continuum defined by system-level
function. That approach, for example, might rank one engine as good
enough for filtering documents, while another engine deemed more
linguistically robust would be ranked higher because it could generate
a good enough initial translation for subsequent post-editing. We
welcome other functional evaluations of MT components and computer
systems with embedded MT components as well.

SUBMISSIONS

Submitters are invited to send in a short paper, not more than 5
pages, addressing one or more of the three areas discussed
above. Papers should define the problem in an embedded MT system that
is the focus of the work, describe the embedded MT system design (a
simple sketch) with sample input data where relevant, and present
their approach to the problem. Work at various stages of completion
is acceptable; we expect the current status of the work to be made
clear. Submission of end-to-end output of an embedded MT system is
especially encouraged. The papers will be collected and distributed
to participants of the workshop.

Ideally, the result of the workshop will be a clearer delineation of:

(1) the range of linguistic preprocessing problems
(2) the range of designs in embedded MT systems
(3) how these problems are treated in different embedded MT systems
(4) the metrics that are being used to evaluate these systems and
their components.

DATES

Notice of interest in participation: July 10, 1998
 
(to vossarl.mil)
Please identify which of the three areas you intend to address:
preprocessing in image data, preprocessing in online data,
evaluation of embedded MT systems.

Position paper submission: August 10, 1998 NOTE: Now, August 24, 1998


Notifications: September 10, 1998 NOTE: Now, September 17, 1998

Final copies of papers: October 10, 1998
Workshop: October 28, 1998

Submissions may be in printed or electronic form.

Submissions should be sent to:

Clare Voss
Army Research Laboratory
AMSRL-IS-CI
2800 Powder Mill Road
Adelphi, MD 20783
phone: (301) 394-5615
fax: (301) 394-3903
e-mail: vossarl.mil

The registration fee for the conference is $50. Non-presenters will
be accepted on a first-come, first served basis. We strongly encourage
the participation of embedded MT system users, as well as members of
the research and development communities.

A copy of the call, the registration form, and further update
information is available via a link at:
<http://rpstl.arl.mil/isb-south/>; Look for the Conferences and
Workshop link.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue