LINGUIST List 15.72

Wed Jan 14 2004

Calls: General Ling; Computational Ling/Portugal

Editor for this issue: Andrea Berez <andrealinguistlist.org>

As a matter of policy, LINGUIST discourages the use of abbreviations or acronyms in conference announcements unless they are explained in the text.

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.

Directory

Allyson Jule, Gender, Language and Religion

ddg, Workshop on Methodologies and Evaluation of Multiword Units in Real-world Applications

Message 1: Gender, Language and Religion

Date: Mon, 05 Jan 2004 11:20:49 -0800
From: Allyson Jule <ajuleglam.ac.uk>
Subject: Gender, Language and Religion

Topic: Gender, Language, and Religion

If interested in submitting an article for a new editted collection for Palgrave, please contact me at

ajuleglam.ac.uk

Sincerely, Dr. Allyson Jule Senior Lecturer University of Glamorgan Wales

Message 2: Workshop on Methodologies and Evaluation of Multiword Units in Real-world Applications

Date: Tue, 6 Jan 2004 05:43:22 -0500 (EST)
From: ddg <ddgdi.ubi.pt>
Subject: Workshop on Methodologies and Evaluation of Multiword Units in Real-world Applications

Workshop on Methodologies and Evaluation of Multiword Units in Real-world Applications Short Title: MEMURA 2004

Date: 25-May-2004 - 25-May-2004 Location: Lisbon, Portugal Contact: Gaël Dias Contact Email: ddgdi.ubi.pt Meeting URL: http://memura2004.di.ubi.pt

Linguistic Sub-field: Computational Linguistics Call Deadline: 23-Feb-2004

Meeting Description:

Multiword units (MWUs) include a large range of linguistic phenomena, such as phrasal verbs (e.g. ''look forward''), nominal compounds (e.g. ''interior designer''), named entities (e.g. ''United Nations''), set phrases (e.g. ''con carne'') or compound adverbs (e.g. ''by the way''), and they can be syntactically and/or semantically idiosyncratic in nature. MWUs are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word. A considerable amount of research has been devoted to this subject, both in terms of theory and practice, but despite increasing interest in idiomaticity within linguistic research, many questions still remain unanswered. The objective of this workshop is to deal with three important questions that are of great interest for real-world applications:

(1) Comparison of MWU extraction methodologies (2) Evaluation of the benefits of the integration of MWUs in real-world applications (3) Comparison of scalable architectures for the extraction and identification of MWUs

MEMURA-2004 Workshop on Methodologies and Evaluation of Multiword Units in Real-world Applications (MEMURA Workshop)

INVITED SPEAKER: KENNETH W. CHURCH

In association with the 4th International Conference On Language Resources and Evaluation - LREC 2004

Centro Cultural de Belém, Lisbon, Portugal May 25, 2004

http://memura2004.di.ubi.pt

********************* CALL FOR PAPERS *********************

This annoucement contains: [1] Workshop Description [2] Target Audience [3] Areas of Interest [4] Invited Speaker [5] Important dates [6] Abstract Submission [7] Workshop Chairs [8] Program Committee [9] Contact

[1] Workshop Description:

Multiword units (MWUs) include a large range of linguistic phenomena, such as phrasal verbs (e.g. ''look forward''), nominal compounds (e.g. ''interior designer''), named entities (e.g. ''United Nations''), set phrases (e.g. ''con carne'') or compound adverbs (e.g. ''by the way''), and they can be syntactically and/or semantically idiosyncratic in nature. MWUs are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word. A considerable amount of research has been devoted to this subject, both in terms of theory and practice, but despite increasing interest in idiomaticity within linguistic research, many questions still remain unanswered. The objective of this workshop is to deal with three important questions that are of great interest for real-world applications.

1) Comparison of MWU extraction methodologies

Many methodologies have been proposed in order to automatically extract or identify MWUs. However, not many efforts have been devoted to compare their results. The core differences between the methodologies is certainly the main reason why such works are so rare. For instance, it is not easy to compare language-dependent methodologies as the results depend on the efficiency of parameter tuning in the broad sense of its acception (i.e. semantic tagging, local specific grammars, lematization, part-of-speech tagging etc.). Another important problem is the fact that there is no real agreement between researchers about the definition of MWUs which would provide the basis for an objective evaluation. The objective of the workshop is to gather people that have recently been working in this area so that new trends in comparing MWU extraction methodologies and their evaluation can be pointed at.

2) Evaluation of the benefits of the integration of MWUs in real-world applications

It is not yet clear whether MWUs really improve NLP applications. It is common sense that Machine Translation is one application that takes great advantage of MWUs databanks. However, does the same apply to applications in Automatic Summarization, Information Retrieval (IR), Cross-language IR, Information Extraction, Text Clustering/Classification, Parallel Corpus Alignment? Indeed, could the identification of MWUs introduce new constraints that are not present in original texts? Should MWUs be considered as units that should not be analysable in terms of their components meaning? Or should they be treated as unanalysable? Should NLP methods work both on isolated words and on agregated MWUs? The answers are anything but clear. Here, the objective of the workshop is to point at successes and failures of the integration of MWUs in real-world applications.

3) Comparison of scalable architectures for the extraction and identification of MWUs

Real-world applications are constrained by variables like processing time and memory space. However, identifying and extracting MWUs is usually a computationally heavy process. In recent years, new algorithms and new technologies have been proposed to introduce MWU treatmement in large scale applications, thus avoiding previous untractable implementations. Previous workshops on MWUs have mainly focused on the unconstrained extraction process. In this workshop, we would like to focus on the comparison of different factors that can influence the scalability of the treatment of MWUs in real-world applications, namely data structures, algorithms, parallel and distributed computing, grid computing etc. Indeed, as we said earlier, some extraction strategies may not scale to deal with huge volumes of data.

[2] Target Audience:

This workshop is intended to bring together NLP researchers working on all areas of MWUs. The objective is to summarise what has been achieved in the area of MWU in real-world applications, to establish common themes between different approaches, and to discuss future trends.

[3] Areas of Interest:

Abstracts are invited on, but not limited to, the following topics:

* Automatic, semi-automatic and manual evaluations of MWUs extractors * Resources for evaluating MWUs extractors * Evaluation Standards * Cross-language and Cross-domain evaluations of MWUs extractors * Comparative evaluation of MWUs extractors * Evaluation of the integration of MWUs in NLP applications: Summarization, (Cross-language) Information Retrieval, Information Extraction, Machine Translation, Text Classification etc. * Scalable algorithms, new data structures, Parallel and Distributed processing and Grid computing for MWUs extraction and/or identification * Comparative evaluation of extraction software architectures * Role of isolated words and MWUs for a sense-based definition of MWUs

Abstracts can cover one or more of these areas.

[4] Invited Speaker:

Kenneth W. Church (AT&T Labs Research, USA)

[5] Important dates:

Abstract submission deadline: February 23, 2004 Notification: March 15, 2004 Camera ready papers: April 12, 2004 Workshop: May 25, 2004

[6] Abstract Submission:

Abstracts should consist of about 1000 words. Abstracts should be submitted electronically in pdf format only to Gaël Harry Dias [ddgdi.ubi.pt]. The following URL transforms postscript files to pdf files (http://www.ps2pdf.com/). The subject line should be ''LREC 2004 MEMURA WORKSHOP PAPER SUBMISSION''.

Because reviewing is blind, no author information should be included as part of the abstract (i.e. the names of the authors and references that could identify the authors). An identification page must be sent in a separate email with the subject line ''LREC 2004 MEMURA WORKSHOP ID PAGE'' and must include title, author(s), keywords, word count and name and email of the contact author.

Late submissions will not be accepted. Notification of receipt will be emailed to the contact author shortly after receipt.

[7] Workshop Chairs:

Gaël Harry Dias (Beira Interior University, Portugal) José Gabriel Pereira Lopes (New University of Lisbon, Portugal) Spela Vintar (University of Ljubljana, Slovenia)

[8] Program Committee:

Timothy Baldwin (Stanford University, United States of America) Sophia Ananiadou (University of Salford, England) Didier Bourigault (University of Toulouse, France) Pascale Fung (University of Science and Technology, Hong Kong) Mikio Yamamoto (University of Tsukuba, Japan) Dekang Lin (University of Alberta, Canada) Aline Villavicencio (University of Cambridge, England) Heiki Kaalep (University of Tartu, Estonia) Joaquim da Silva (New University of Lisbon) Eric Gaussier (Xerox Research Centre Europe, France) Adeline Nazarenko (University Paris XIII, France) António Branco (Lisbon University, Portugal)

[9] Contact:

Contact:

Gaël Harry Dias Human Language Technology Interest Group Departamento de Informática Universidade da Beira Interior Rua Marquêsvila e Bolama 6201-001 Covilh�i Portugal email: ddgdi.ubi.pt Tel: +351 275 319 700 Fax: +351 275 319 732