LINGUIST List 16.656

Sat Mar 05 2005

Calls: Computational Ling/USA; General Ling/Spain

Editor for this issue: Amy Wronkowicz <>

As a matter of policy, LINGUIST discourages the use of abbreviations or acronyms in conference announcements unless they are explained in the text. To post to LINGUIST, use our convenient web form at


        1.    Christof Monz, ACL 2005 Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond
        2.    Karen Lahousse, Sentence-Initial and Sentence-Final Positions: On the Interplay between Syntax, Semantics and Information Structure

Message 1: ACL 2005 Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond

Date: 03-Mar-2005
From: Christof Monz <>
Subject: ACL 2005 Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond

Full Title: ACL 2005 Workshop on Building and Using Parallel Texts: Data-Driven
Machine Translation and Beyond

Date: 29-Jun-2005 - 30-Jun-2005
Location: Ann Arbor, Michigan, United States of America
Contact Person: Christof Monz
Web Site:

Linguistic Field(s): Computational Linguistics

Call Deadline: 10-Apr-2005

Meeting Description:

Call for papers


Workshop at the Annual Meeting of
the Association of Computational Linguistics (ACL 2005)

Ann Arbor, Michigan
June 29-30, 2005

submission deadline: April 10, 2005 (April 17, 2005 short papers)

The goal of this workshop is to provide a forum for researchers working on
problems related to the creation and use of parallel text. Recent events have
demonstrated once again the importance of inter-language communication across a
broad range of languages. This reinforces the need for advances in machine
translation (MT) and multi-lingual processing tools, especially for languages
with scarce resources.

This is a two-day workshop featuring two tracks:

1. Building and Using Parallel Texts for Languages
with Scarce Resources (day 1)

2. Exploiting Parallel Texts for Statistical Machine
Translation (day 2)

Both tracks feature a shared task each, that allows participants to compare
their results on a common task. Although not required, we encourage submissions
to participate in the shared tasks for bench-marking purposes.



The aim of this track is to bring together researchers involved in the study of
creating and using parallel corpora for minority languages. The track will be
therefore centered around issues related to manual/automatic collection of
parallel corpora, studies in the ''import'' of knowledge from a well-studied
language via parallel alignments, evaluations of the quality of collected
corpora or the quality of the tools that are derived based on these corpora.

We invite submissions of papers addressing any of the following issues:

* Construction of parallel corpora, including the automatic identification
and harvesting of parallel corpora from the Web
* Tools for processing parallel corpora, including automatic sentence
alignment, word alignment, phrase alignment, detection of omissions and gaps in
translations, and others
* Methods to evaluate the quality of parallel corpora and word alignments
* Using parallel corpora for the derivation of language processing tools in
new languages
* Using parallel corpora for automatic corpus annotation (e.g. word sense
* Using parallel corpora for cross-language information retrieval and extraction
* The quality of language resources and systems that can be constructed with
small amounts of parallel text and how do these scale up with the amount of text
* The role of external knowledge sources (e.g. bilingual dictionaries) in
building resources and systems relying on parallel texts.
* Machine learning techniques for building and exploiting parallel texts
(e.g. using small amounts of human-aligned parallel text to bootstrap large
aligned corpora; active selection of data based on usefulness for different tasks)

While we invite submissions addressing any of the above topics, or related
issues, we particularly welcome work involving parallel corpora addressing
languages with scarce resources.

Shared task

In addition to regular paper presentations, the track will also include a
shared task for the evaluation of various word alignment techniques. Word
alignment represents an important step in exploiting parallel corpora, and yet
there is no common evaluation framework for such systems. This follows on the
success of the word alignment task that took place as a part of the NAACL 2003
workshop on parallel text. This year's edition will be distinct in that it will
focus on Inuktitut-English and Romanian-English alignment. This fits into the
theme of our track, since neither Inuktitut nor Romanian is a widely studied
language, and there are relatively few online resources and tools available.

Teams that participate in the alignment exercise will be provided the training
data for each language pair and development data taken from the gold standard
data in order to build their systems. Thereafter they will be provided the
unaligned gold standard data and asked to submit their proposed alignments in a
short time frame. There will be two tracks for each language pair, one for teams
that augment the training data with additional resources, and another for those
that only use the training data. The resulting alignments will be evaluated
relative to the previously mentioned gold standard data prior to the workshop.
Short papers describing systems participating in this shared task and all
evaluation methodologies employed will constitute a separate section in the
workshop proceedings.

A more detailed description, training, development, and test data, and a number
of other related resources will be made available from


The focus of this track is to use parallel corpora for machine translation.

Translating documents from foreign languages into English (or between any two
languages) by computer is one of the oldest goals in computational linguistics.
Now, armed with vast amounts of digitally available translated text and
powerful computers, we are witnessing significant progress toward achieving
that goal. Statistical methods allow the analysis of parallel text corpora and
the automatic construction of machine translation systems. Already, for some
language pairs such as Chinese-English or Arabic-English, statistical machine
translation (SMT) systems built at research labs outperform commercial systems.

Recent experimentation has shown that the performance of SMT systems varies
greatly with the source language. In this workshop we would like to encourage
researchers to investigate ways to improve the performance of SMT systems for
diverse languages, including morphologically complex languages (e.g., Finnish)
and languages with partial free word order (e.g., German). These issues lie on
the border of linguistic analysis and statistical modeling, and the ACL
conference is the most appropriate forum to investigate them, as ACL has a long
tradition of hosting high-quality research in both areas.

Topics of interest include, but are not limited to:

* word-based, chunk-based, phrase-based, syntax-based SMT
* using comparable corpora for SMT
* using morphological and POS information for SMT
* integration of rule-based MT and statistical MT
* decoding
* error analysis

In addition to submissions on the topics listed above, this track of the
workshop features a shared task and we encourage participants to evaluate their
approaches on that task. The shared task is to evaluate your approach to
machine translation---see the list of topics of interests above---on the
Europarl corpus.

A more detailed description of the shared task, the test and training corpora, a
freely available MT system, and a number of other resources are available from


Submissions will consist of regular full papers of max. 8 pages, formatted
following the ACL 2005 guidelines. Authors of regular full papers will be
required to indicate a track for their submission. In addition, teams
participating in the shared tasks will be invited to submit short papers (max. 4
pages) describing their systems. Both submission and review processes will be
handled electronically.


Regular paper submissions: April 10
(shared task) Results submissions: April 10
(shared task) Short paper submissions: April 17
Notification (short and regular papers): May 4
Camera-ready papers: May 15


Philipp Koehn (University of Edinburgh)
Joel Martin (National Research Council of Canada)
Rada Mihalcea (University of North Texas)
Christof Monz (University of Maryland)
Ted Pedersen (University of Minnesota, Duluth)


For questions, comments, etc. please send email to


Lars Ahrenberg (Linkoping University)
Bill Byrne (University of Cambridge)
Chris Callison-Burch (University of Edinburgh)
Nicoletta Calzolari (University of Pisa)
Francisco Casacuberta (University of Valencia)
David Chiang (University of Maryland)
Mona Diab (Columbia University)
George Foster (Canada National Research Council)
Alexander Fraser (ISI/University of Southern California)
Pascale Fung (Hong Kong University of Science and Technology)
Rob Gaizauskas (University of Sheffield)
Ulrich German (University of Toronto)
Dan Gildea (University of Rochester)
Jan Hajic (Charles University)
Andrew Hardie (University of Lancaster)
Rebecca Hwa (University of Pittsburgh)
Nancy Ide (Vassar College)
Kevin Knight (ISI/University of Southern California)
Greg Kondrak (University of Alberta)
Shankar Kumar (Johns Hopkins University)
Philippe Langlais (University of Montreal)
Alon Lavie (Carnegie Mellon University)
Lori Levin (Carnegie Mellon University)
Daniel Marcu (ISI/University of Southern California)
Tony McEnery (University of Lancaster)
Bridget McInnes (University of Minnesota)
Magnus Merkel (Linkoping University)
Bob Moore (Microsoft Research)
Maria das Gracas Volpe Nunes (University of Sao Paulo)
Franz-Josef Och (Google)
Kemal Oflazer (Sabanci University)
Miles Osborne (University of Edinburgh)
Andrei Popescu-Belis (University of Geneva)
Katharina Probst (CMU)
Amruta Purandare (University of Pittsburgh)
Florence Reeder (MITRE)
Philip Resnik (University of Maryland)
Antonio Ribeiro (European Commission Joint Research Council)
Michel Simard (Xerox)
Kevin Scannell (St. Louis University)
Libin Shen (University of Pennsylvania)
Eiichiro Sumita (ATR Spoken Language Translation Research Lab)
Joerg Tiedemann (University of Groningen)
Christoph Tillmann (IBM)
Dan Tufis (Research Institute for AI of the Romanian Academy)
Jean Veronis (Universite de Provence)
Michelle Vanni (Army Research Lab)
Stephan Vogel (Carnegie Mellon University)
Clare Voss (Army Research Lab)
Taro Watanabe (ATR Spoken Language Translation Research Laboratories)
Dekai Wu (Hong Kong University of Science and Technology)

Message 2: Sentence-Initial and Sentence-Final Positions: On the Interplay between Syntax, Semantics and Information Structure

Date: 04-Mar-2005
From: Karen Lahousse <>
Subject: Sentence-Initial and Sentence-Final Positions: On the Interplay between Syntax, Semantics and Information Structure

Full Title: Sentence-Initial and Sentence-Final Positions: On the Interplay
between Syntax, Semantics and Information Structure

Date: 09-Sep-2005 - 09-Sep-2005
Location: Valencia, Spain
Contact Person: Karen Lahousse
Meeting Email:
Web Site:

Linguistic Field(s): General Linguistics

Call Deadline: 01-Apr-2005

Meeting Description:

This workshop aims at bringing together researchers from different
theoretical perspectives working on different languages, to discuss the way
in which syntax, semantics and information structure interact with respect
to the peripheral positions of the clause. More particularly, we invite
papers based on data from all types of languages, focusing on the
sentence-initial and sentence-final positions in the clause. These include
but are not restricted to left- and right-dislocation, clefts,
pseudo-clefts, sentence-initial and sentence-final positions of adverbs and
the postverbal position of the subject.

Sentence-Initial and Sentence-Final Positions
On the Interplay between Syntax, Semantics and Information Structure

Workshop at the 38th meeting of the Societas Linguistica Europaea


Karen Lahousse
(Fund for Scientific Research - Flanders & Katholieke Universiteit Leuven,

Andrée Borillo
(ERSS UMR 56-10, Maison de la Recherche, Université Toulouse - le Mirail)


Liliane Haegeman
(SILEX, Université de Lille III, Villeneuve d'Ascq)

Sophie Prévost

Jean-Marie Marandin
(UMR 7110, Université Paris 7)


- Abstracts are invited for 20-minute presentations plus 10 minutes for
- The abstract should be anonymous and contain no more than 500 words,
exclusive of examples and references. When printed out, the title and body
should fit on a single page of 12-point Times New Roman, with 2 cm margins.
- First fill out the registration form at the SLE-website
- Then send the anonymous abstract TO BOTH AND
- Abstracts which, for reasons of time, cannot be included in the workshop
will automatically be considered for the general session of the conference.
- Deadline for submission = 1 April 2005.


- 1 April 2005: deadline for submission
- 30 April 2005: notification of acceptance (by e-mail)
- 30 May 2005: payment of the registration fee (see
- 7-10 September 2005: conference, 9 September: workshop


Karen Lahousse

Fund of Scientific Research - Flanders
Katholieke Universiteit Leuven, Department of Linguistics
Blijde-Inkomststraat 21
B-3000 Leuven (Belgium)

Respond to list|Read more issues|LINGUIST home page|Top of issue