LINGUIST List 15.131

Fri Jan 16 2004

Calls: Text/Corpus Ling/Poland; Computational Ling

Editor for this issue: Marie Klopfenstein <>

As a matter of policy, LINGUIST discourages the use of abbreviations or acronyms in conference announcements unless they are explained in the text. To post to LINGUIST, use our convenient web form at


  1. kprzemek, Workshop ''Assessing the potential of corpora''
  2. Anna Korhonen, Computer Speech and Language Special Issue on Multiword Expressions

Message 1: Workshop ''Assessing the potential of corpora''

Date: Tue, 6 Jan 2004 17:53:44 -0500 (EST)
From: kprzemek <>
Subject: Workshop ''Assessing the potential of corpora''

Workshop ''Assessing the potential of corpora'' 

Date: 20-May-2004 - 20-May-2004
Location: Tarnowo Podgorne, Poland
Contact: Przemek Kaszubski
Contact Email: 
Meeting URL: 

Linguistic Sub-field: Text/Corpus Linguistics 
Call Deadline: 15-Mar-2004 

Meeting Description:

WORKSHOP ''Assessing the potential of corpora''
at 35th Poznan Linguistic Meeting 2004
May 20, from 9.00 am
35th Poznan Linguistic Meeting 2004

Call for Papers
The goal of the workshop is to convene a forum of users of language
corpora interested in the exchange of ideas related to the feasibility
of corpus use in linguistic study and language teaching. We welcome
papers and/or demonstrations describing corpus-inspired research and
pedagogical applications, and, ideally, attempting to evaluate the
resources, tools and procedures examined for the purpose. Some of the
likely areas include:

 * insights from large and small corpora for descriptive and
pedagogical linguistics
 * data-driven-learning and other uses of corpora in/for the classroom 
 * corpus-based/-driven contrastive analysis 
 * corpus-based/-driven genre analysis 
 * learner corpora: Error Analysis and more 
 * corpora and translation 
 * the ''web as corpus'' for language research and pedagogy 
 * corpus-based/-driven tools and methods for language task learning
* ... and many many more :)

Topics proposed so far:
 * Automatic phonetic annotation of corpora for EFL purposes -
Prof. Wlodzimierz Sobkowiak
 * Studying metaphor with the BNC - Dr. Malgorzata Fabiszak 
 * Corpora for the teaching of translation - Maciej Machniewski (PhD
 * Corpus-based teaching of English syntax - Dr. Pawel Scheffler 
 * Web concordancing and EFL writing - Dr. Przemek Kaszubski

Presentations will last 30 minutes and be followed by 10-minute
discussion. The conference language will be English; however we also
welcome papers based on corpora of Polish and other languages.

Abstracts of 250-300 words should be e-mailed by March 15 to:
Przemek Kaszubski

To register and receive more information on PLM2004, go to:

- --------
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Computer Speech and Language Special Issue on Multiword Expressions

Date: Tue, 30 Dec 2003 23:10:43 +0000
From: Anna Korhonen <>
Subject: Computer Speech and Language Special Issue on Multiword Expressions


Journal of Computer Speech and Language

Special Issue on Multiword Expressions

Guest editors:

Aline Villavicencio (University of Cambridge, UK)
Francis Bond (NTT Communication Science Laboratories, Japan)
Anna Korhonen (University of Cambridge, UK)
Diana McCarthy (University of Sussex, UK)

Multiword expressions (MWEs) include a large range of linguistic
phenomena, such as phrasal verbs (e.g. "add up"), nominal compounds
(e.g. "telephone box"), and institutionalized phrases (e.g. "salt and
pepper"), and they can be syntactically and/or semantically
idiosyncratic in nature. MWEs are used frequently in everyday
language, usually to express precisely ideas and concepts that cannot
be compressed into a single word. A considerable amount of research
has been devoted to this subject, both in terms of theory and
practice, but despite increasing interest in idiomaticity within
linguistic research, there is still a gap between the needs of natural
language processing (NLP) and the descriptive tradition of
linguistics. Most real-world applications tend to ignore MWEs or
address them simply by listing. However, it is clear that successful
applications will need to be able to identify and treat them more

In recent years there has been a growing awareness in the NLP
community of the problems that MWEs pose and the need for their robust
handling. This special issue of Computer Speech and Language, due for
publication in 2005, will be devoted to the acquisition,
identification and treatment of MWEs. We invite papers adopting a
quantitive approach to the following aspects of MWE research:

* Extraction of MWEs:
There has been considerable research into extraction of lists of some
multiword expressions and collocations of certain types, such as noun
noun compounds, institutionalised expressions and verb particle
constructions. Papers which explore the benefits and weaknesses of
methods across different MWE types, and across different languages are
particularly welcome. Also, we encourage papers where the extraction
is not limited to an enumeration of MWEs of a given type, but permits
some sort of subcategorization or analysis of the syntactic or
semantic properties of the expression.

* Evaluation of extracted MWEs:
To date researchers have tended to evaluate MWE extraction by
exploiting available man-made lexical resources or using manual
annotation of either the input data or the automatically extracted
lists. There is considerable scope for proposals of standard
evaluation metrics, test and training data and for task-based

* Identification of MWEs:
Whilst there has been considerable research on extraction, less
attention has been paid to determining if a candidate multiword token
is in fact a genuine multiword, or simply a regular compositional
occurrence of the words that can comprise a multiword e.g. "She looked
up the road" vs "She looked up his telephone number".

* The benefits of MWE identification and treatment for applications:
Papers are encouraged which expose the problems that MWEs pose for
specific applications and solutions to these problems

Submission Information:

Deadline for paper submissions: May 5, 2004

All submissions will be subject to the normal peer review process for
this journal.

Submissions in electronic form (PDF) are strongly preferred and must
conform to the Computer Speech and Language specifications, which are
available at:

Any initial queries should be addressed to
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue