LINGUIST List 9.969

Sun Jun 28 1998

Calls: Language and Gender, AMTA

Editor for this issue: Martin Jacobsen <>

Please do not use abbreviations or acronyms for your conference unless you explain them in your text. Many people outside your area of specialization will not recognize them. Also, if you are posting a second call for the same event, please keep the message short. Thank you for your cooperation.


  1. Marlis Hellinger, Language and Gender
  2. Flo Reeder, AMTA Workshop on Embedded MT Systems - CFP

Message 1: Language and Gender

Date: Thu, 25 Jun 1998 12:56:20 +0200
From: Marlis Hellinger <>
Subject: Language and Gender

Project: The de/construction of gender roles through language
variation and change: International perspectives

Marlis Hellinger and Hadumod Bussmann (eds.)

Dear linguists,

actually, we had already closed the list of languages/authors (with
some 30 languages now on the project), but recently three languages
"dropped out", and we are looking for someone who could contribute on
Chinese, Romanian, or Yiddish. Maybe Hungarian could still be added.

The original call for contributions ran as follows: Since the
establishment of feminist linguistics more than two decades ago a
wealth of theoretical and empirical information has become available
and we believe it is time for a collection that looks across
individual language boundaries. We are therefore compiling a volume on
the structural and functional aspects of gender-related variation and
change in different languages. We are primarily concerned with
structural properties of a language (categories of gender,
word-formation, pronominalization) and speakers' linguistic choices in
talking about or as women and men. We are also interested in learning
about the tendencies of variation and change (including, where
applicable, language politics) as these reflect changes in the
relationship between the sexes.

Would anyone be interested in participating? Or could someone suggest
potential authors to us? Details on the project would then be made

Reply to:
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: AMTA Workshop on Embedded MT Systems - CFP

Date: Thu, 25 Jun 1998 17:08:49 -0400
From: Flo Reeder <>
Subject: AMTA Workshop on Embedded MT Systems - CFP

- -------------------


Design, Construction, and Evaluation of Systems with an MT Component

Wednesday, October 28, 1998 (preceding the AMTA 98 conference)
Sheraton Bucks County Hotel, Langhorne, Pennsylvania


As the strengths and weaknesses of machine translation (MT) engines
have become better understood and accepted, there has been a marked
increase in the development of computer systems with anembedded MT
component. One consequence of this shift to"embedded MT"is that
researchers, developers, as well as users have begun pushing the
limits on the input that such systems will accept for translation. In
so doing, a new class of problems has surfaced: any input---whether it
appears in physical form on paper, in electronic form on-line, or
mixed in with another modality such as graphics or video---will bring
with it some unknown mix of noisy natural language data as well as
non-linguistic data. How are systems with an MT component to be
designed and evaluated given the challenge this input brings?

The objective of this workshop is to examine and evaluate techniques
for adjusting this "linguistic impedance mismatch" between the
real-world input and the natural language input expected by various MT
engines. Thus the workshop will focus on computational approaches to
preprocessing system input for MT engines andon statistical methods
for evaluating systems with an embedded MT component.

Linguistic Preprocessing In Image Data 

For researchers working with image data, there is currently underway
an effort to augment OCR (optical character recognition) engines with
linguistic data as they recognize and convert bitmap data into
characters---similar to what has already been done in speech
recognition with linguistic data in HMMs (hidden Markov models).
Other OCR researchers have also experimented with image-level early
topic detection using word-shape recognition. In principle, this
could provide a first-step filtering of documents into a more
homogeneous MT input set, a desirable goal for MT evaluation. Thus we
expect that individuals working with or intending to incorporate OCR
into their computer systems will be interested in this new area.

Linguistic Preprocessing in Online Data 

For those working with online input, even though the characters are
already present, there often still remains the task of preprocessing
meaningful, symbolic character strings that are not a part of the text
to be translated. For some systems, the rules for identifying and
encapsulating or removing such strings may need to be hand-crafted
over time as MT engine limitations surface. For others, a combination
of hand-crafted rules and statistically trained NL models has worked.
Many have observed that the HTML annotations, alphanumeric items,
spreadsheet and word processing codes are harder to weed out than
originally expected.

Research efforts with the low-density and less-commonly taught
languages, as well as more common ones, encounter a substantial
problem with variation in spelling conventions and transcription
preferences. For those natural languages that are primarily spoken
and not written, for example, this is frequently the case.
Researchers working on this class of problem have built variants on
spell checkers (SC), components that standardize words to one
orthography (spelling convention) before submitting it to an MT
engine. An idea that has arisen for this component is to build in an
option to adjust the level of SC correction---as would be relevant
when input after OCR nonetheless varies from very noisy to relatively

Evaluation of Embedded MT Systems 

Among those working on statistical methods for evaluating systems with
an embedded MT component, we have seen two distinct trends. One group
of statisticians has begun looking for appropriate models from outside
the world of MT evaluation, examining the efforts by others to take
distinct metrics for components and combine them for an overall
system-level metric using fuzzy mathematics. Another group of
researchers is looking instead at developing a one-dimensional scale
for ranking MT engines along a continuum defined by system-level
function. That approach, for example, might rank one engine as good
enough for filtering documents, while another engine deemed more
linguistically robust would be ranked higher because it could generate
a good enough initial translation for subsequent post-editing. We
welcome other functional evaluations of MT components and computer
systems with embedded MT components as well.


Submitters are invited to send in a short paper, not more than 5 pages,
addressing one or more of the three areas discussed above. Papers 
should define the problem in an embedded MT system that is the focus 
of the work,describe the embedded MT system design (a simple sketch)
with sample input data where relevant, and present their approach
to the problem.Work at various stages of completion is acceptable;
we expect the current status of the work to be made clear. Submission of 
end-to-end output of an embedded MT system is especially encouraged.
The papers will be collected and distributed to participants of theworkshop.

Ideally, the result of the workshop will be a clearer delineation of:
(1)the range of linguistic preprocessing problems
(2)the range of designs in embedded MT systems
(3)how these problems aretreated in different embedded MT systems and 
(4)the metrics that are being used to evaluate these systems and their


Notice of interest in participation: July 10, 1998 
Please identify which of the three areas you intend to address:
preprocessing in image data, preprocessing in online data, 
evaluation of embedded MT systems.

Position paper submission: August 10, 1998
Notifications: September 10, 1998
Final copies of papers: October 10, 1998
Workshop: October 28, 1998

Submissions may be in printed or electronic form. 

Submissions should be sent to:

Clare Voss
Army Research Laboratory
2800 Powder Mill Road
Adelphi, MD 20783 
phone: (301) 394-5615
fax: (301) 394-3903

The registration fee for the conference is $50. Non-presenters will
be accepted on a first-come, first served basis. We strongly encourage
the participation of embedded MT system users, as well as members of
the research and development communities.

After July 11, 1998, a copy of the call, the registration form, and
further update information will be available via a link at:
Florence Reeder		| Phone: (703) 883-7156
The MITRE Corporation	|	 (703) 883-6750 (secretary)
MS W640			| Fax: (703) 883-1279
1820 Dolley Madison Blvd.	| email:
McLean, VA 22102		|
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue