Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
Project: The de/construction of gender roles through language variation and change: International perspectives Marlis Hellinger and Hadumod Bussmann (eds.) Dear linguists, actually, we had already closed the list of languages/authors (with some 30 languages now on the project), but recently three languages "dropped out", and we are looking for someone who could contribute on Chinese, Romanian, or Yiddish. Maybe Hungarian could still be added. The original call for contributions ran as follows: Since the establishment of feminist linguistics more than two decades ago a wealth of theoretical and empirical information has become available and we believe it is time for a collection that looks across individual language boundaries. We are therefore compiling a volume on the structural and functional aspects of gender-related variation and change in different languages. We are primarily concerned with structural properties of a language (categories of gender, word-formation, pronominalization) and speakers' linguistic choices in talking about or as women and men. We are also interested in learning about the tendencies of variation and change (including, where applicable, language politics) as these reflect changes in the relationship between the sexes. Would anyone be interested in participating? Or could someone suggest potential authors to us? Details on the project would then be made available. Reply to: HellingerMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueem.uni-frankfurt.de
WORKSHOP ANNOUNCEMENT - ------------------- WORKSHOP ON EMBEDDED MT SYSTEMS CALL FOR PAPERS Design, Construction, and Evaluation of Systems with an MT Component Wednesday, October 28, 1998 (preceding the AMTA 98 conference) Sheraton Bucks County Hotel, Langhorne, Pennsylvania Introduction As the strengths and weaknesses of machine translation (MT) engines have become better understood and accepted, there has been a marked increase in the development of computer systems with anembedded MT component. One consequence of this shift to"embedded MT"is that researchers, developers, as well as users have begun pushing the limits on the input that such systems will accept for translation. In so doing, a new class of problems has surfaced: any input---whether it appears in physical form on paper, in electronic form on-line, or mixed in with another modality such as graphics or video---will bring with it some unknown mix of noisy natural language data as well as non-linguistic data. How are systems with an MT component to be designed and evaluated given the challenge this input brings? The objective of this workshop is to examine and evaluate techniques for adjusting this "linguistic impedance mismatch" between the real-world input and the natural language input expected by various MT engines. Thus the workshop will focus on computational approaches to preprocessing system input for MT engines andon statistical methods for evaluating systems with an embedded MT component. Linguistic Preprocessing In Image Data For researchers working with image data, there is currently underway an effort to augment OCR (optical character recognition) engines with linguistic data as they recognize and convert bitmap data into characters---similar to what has already been done in speech recognition with linguistic data in HMMs (hidden Markov models). Other OCR researchers have also experimented with image-level early topic detection using word-shape recognition. In principle, this could provide a first-step filtering of documents into a more homogeneous MT input set, a desirable goal for MT evaluation. Thus we expect that individuals working with or intending to incorporate OCR into their computer systems will be interested in this new area. Linguistic Preprocessing in Online Data For those working with online input, even though the characters are already present, there often still remains the task of preprocessing meaningful, symbolic character strings that are not a part of the text to be translated. For some systems, the rules for identifying and encapsulating or removing such strings may need to be hand-crafted over time as MT engine limitations surface. For others, a combination of hand-crafted rules and statistically trained NL models has worked. Many have observed that the HTML annotations, alphanumeric items, spreadsheet and word processing codes are harder to weed out than originally expected. Research efforts with the low-density and less-commonly taught languages, as well as more common ones, encounter a substantial problem with variation in spelling conventions and transcription preferences. For those natural languages that are primarily spoken and not written, for example, this is frequently the case. Researchers working on this class of problem have built variants on spell checkers (SC), components that standardize words to one orthography (spelling convention) before submitting it to an MT engine. An idea that has arisen for this component is to build in an option to adjust the level of SC correction---as would be relevant when input after OCR nonetheless varies from very noisy to relatively clean. Evaluation of Embedded MT Systems Among those working on statistical methods for evaluating systems with an embedded MT component, we have seen two distinct trends. One group of statisticians has begun looking for appropriate models from outside the world of MT evaluation, examining the efforts by others to take distinct metrics for components and combine them for an overall system-level metric using fuzzy mathematics. Another group of researchers is looking instead at developing a one-dimensional scale for ranking MT engines along a continuum defined by system-level function. That approach, for example, might rank one engine as good enough for filtering documents, while another engine deemed more linguistically robust would be ranked higher because it could generate a good enough initial translation for subsequent post-editing. We welcome other functional evaluations of MT components and computer systems with embedded MT components as well. SUBMISSIONS Submitters are invited to send in a short paper, not more than 5 pages, addressing one or more of the three areas discussed above. Papers should define the problem in an embedded MT system that is the focus of the work,describe the embedded MT system design (a simple sketch) with sample input data where relevant, and present their approach to the problem.Work at various stages of completion is acceptable; we expect the current status of the work to be made clear. Submission of end-to-end output of an embedded MT system is especially encouraged. The papers will be collected and distributed to participants of theworkshop. Ideally, the result of the workshop will be a clearer delineation of: (1)the range of linguistic preprocessing problems (2)the range of designs in embedded MT systems (3)how these problems aretreated in different embedded MT systems and (4)the metrics that are being used to evaluate these systems and their components. DATES Notice of interest in participation: July 10, 1998 (to vossMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuearl.mil) Please identify which of the three areas you intend to address: preprocessing in image data, preprocessing in online data, evaluation of embedded MT systems. Position paper submission: August 10, 1998 Notifications: September 10, 1998 Final copies of papers: October 10, 1998 Workshop: October 28, 1998 Submissions may be in printed or electronic form. Submissions should be sent to: Clare Voss Army Research Laboratory AMSRL-IS-CI 2800 Powder Mill Road Adelphi, MD 20783 phone: (301) 394-5615 fax: (301) 394-3903 e-mail: voss
arl.mil The registration fee for the conference is $50. Non-presenters will be accepted on a first-come, first served basis. We strongly encourage the participation of embedded MT system users, as well as members of the research and development communities. After July 11, 1998, a copy of the call, the registration form, and further update information will be available via a link at: <http://rpstl.arl.mil/ISB/> Florence Reeder | Phone: (703) 883-7156 The MITRE Corporation | (703) 883-6750 (secretary) MS W640 | Fax: (703) 883-1279 1820 Dolley Madison Blvd. | email: reeder
azrael.mitre.org McLean, VA 22102 |