From: Mikel Forcada <mlfdlsi.ua.es>
Subject: Information Retrieval and Information Extraction for Less Resourced Languages
E-mail this message to a friend
Full Title: Information Retrieval and Information Extraction for Less Resourced
Short Title: IR-IE-LRL
Date: 07-Sep-2009 - 07-Sep-2009
Location: Donostia / San Sebastián, Spain
Contact Person: Kepa Sarasola
Meeting Email: kepa.sarasolaehu.es
Web Site: http://ixa2.si.ehu.es/saltmil/
Linguistic Field(s): Applied Linguistics; Computational Linguistics; Text/Corpus
Call Deadline: 08-Jun-2009
Information Retrieval and Information Extraction for Less Resourced Languages
SEPLN 2009 pre-conference workshop
University of the Basque Country
Donostia-San Sebastián. Monday 7th September 2009
Organised by the SALTMIL Special Interest Group of ISCA
Call for Papers
SEPLN 2009: http://ixa2.si.ehu.es/sepln2009
Call for Papers:
Paper submission: http://sepln.org/myreview-saltmil2009
Deadline for submission: 8 June 2009
Papers are invited for the above half-day workshop, in the format outlined
below. Most submitted papers will be presented in poster form, though some
authors may be invited to present in lecture format.
Context and Focus
The phenomenal growth of the Internet has led to a situation where, by some
estimates, more than one billion words of text is currently available. This is
far more text than any given person can possibly process. Hence there is a need
for automatic tools to access and process his mass of textual information.
Emerging techniques of this kind include Information Retrieval (IR), Information
Extraction (IE), and Question Answering (QA)
However, there is a growing concern among researchers about the situation of
languages other than English. Although not all Internet text is in English, it
is clear that non-English languages do not have the same degree of
representation on the Internet. Simply counting the number of articles in
Wikipedia, English is the only language with more than 20 percent of the
available articles. There then follows a group of 17 languages with between one
and ten percent of the articles. The remaining 245 languages each have less than
one percent of the articles. Even these low-profile languages are relatively
privileged, as the total number of languages in the world is estimated to be 6800.
Clearly there is a danger that the gap between high-profile and low-profile
languages on the Internet will continue to increase, unless tools are developed
for the low-profile languages to access textual information. Hence there is a
pressing need to develop basic language technology software for less-resourced
languages as well. In particular, the priority is to adapt the scope of
recently-developed IE, IR and QA systems so that they can be used also for these
languages. In doing so, several questions will naturally arise, such as:
- What problems emerge when faced with languages having different linguistic
features from the major languages?
- Which techniques should be promoted in order to get the maximum yield from
sparse training data?
- What standards will enable researchers to share tools and techniques across
several different languages?
- Which tools are easily re-useable across several unrelated languages?
It is hoped that presentations will focus on real-world examples, rather than
purely theoretical discussions of the questions. Researchers are encouraged to
share examples of best practice -- and also examples where tools have not worked
as well as expected. Also of interest will be cases where the particular
features of a less-resourced language raise a challenge to currently accepted
linguistic models that were based on features of major languages.
Given the context of IR, IE and QA, topics for discussion may include, but are
not limited to:
- Information retrieval;
- Text and web mining;
- Information extraction;
- Text summarization;
- Term recognition;
- Text categorization and clustering;
- Question answering;
- Re-use of existing IR, IE and QA data;
- Interoperability between tools and data.
- General speech and language resources for minority languages, with
particular emphasis on resources for IR,IE and QA.
- 8 June 2009: Deadline for submission
- 1 July 2009: Notification
- 15 July 2009: Final version
- 7 September 2009: Workshop
- Kepa Sarasola, University of the Basque Country
- Mikel Forcada, Universitat d'Alacant, Spain
- Iñaki Alegria. University of the Basque Country
- Xabier Arregi, University of the Basque Country
- Arantza Casillas. University of the Basque Country
- Briony Williams, Language Technologies Unit, Bangor University, Wales, UK
- Iñaki Alegria. University of the Basque Country.
- Atelach Alemu Argaw: Stockholm University, Sweden
- Xabier Arregi, University of the Basque Country.
- Jordi Atserias, Barcelona Media (yahoo! research Barcelona)
- Shannon Bischoff, Universidad de Puerto Rico, Puerto Rico
- Arantza Casillas. University of the Basque Country.
- Mikel Forcada: Universitat d'Alacant, Spain
- Xavier Gomez Guinovart. University of Vigo.
- Lori Levin, Carnegie-Mellon University, USA
- Climent Nadeu, Universitat Politècnica de Catalunya
- Jon Patrick, University of Sydney, Australia
- Juan Antonio Pérez-Ortiz, Universitat d'Alacant, Spain
- Bojan Petek, University of Ljubljana, Slovenia
- Kepa Sarasola, University of the Basque Country
- Oliver Streiter, National University of Kaohsiung, Taiwan
- Vasudeva Varma, IIIT, Hyderabad, India
- Briony Williams: Bangor University, Wales, UK
We expect short papers of max 3500 words (about 4-6 pages) describing research
addressing one of the above topics, to be submitted as PDF documents by
uploading to the following URL: http://sepln.org/myreview-saltmil2009
The final papers should not have more than 6 pages, adhering to the stylesheet
that will be adopted for the SEPLN Proceedings (to be announced later on the
Conference web site).
This Year the LINGUIST List hopes to raise $60,000. This money will go to help
keep the List running by supporting all of our Student Editors for the coming year.
See below for donation instructions, and don't forget to check out our Fund Drive
2009 LINGUIST List Restaurant and join us for a delightful treat!
There are many ways to donate to LINGUIST!
You can donate right now using our secure credit card form at
Alternatively you can also pledge right now and pay later. To do so, go to:
For all information on donating and pledging, including information on how to
donate by check, money order, or wire transfer, please visit:
The LINGUIST List is under the umbrella of Eastern Michigan University and as such
can receive donations through the EMU Foundation, which is a registered 501(c) Non
Profit organization. Our Federal Tax number is 38-6005986. These donations can be
offset against your federal and sometimes your state tax return (U.S. tax payers
only). For more information visit the IRS Web-Site, or contact your financial advisor.
Many companies also offer a gift matching program, such that they will match any
gift you make to a non-profit organization. Normally this entails your contacting
your human resources department and sending us a form that the EMU Foundation fills
in and returns to your employer. This is generally a simple administrative procedure
that doubles the value of your gift to LINGUIST, without costing you an extra penny.
Please take a moment to check if your company operates such a program.
Thank you very much for your support of LINGUIST!
Read more issues|LINGUIST home page|Top of issue