LINGUIST List 21.3186

Thu Aug 05 2010

Confs: Computational Ling, Lang Documentation, Text/Corpus Ling/USA

Editor for this issue: Amy Brunett <>

        1.    Jeff Allen, Seminar on Language Technologies for Low-data Languages

Message 1: Seminar on Language Technologies for Low-data Languages
Date: 05-Aug-2010
From: Jeff Allen <>
Subject: Seminar on Language Technologies for Low-data Languages
E-mail this message to a friend

Seminar on Language Technologies for Low-data Languages

Date: 17-Aug-2010 - 17-Aug-2010 Location: Redmond, Washington, USA Contact: Jeff Allen Contact Email: Meeting URL:

Linguistic Field(s): Computational Linguistics; Language Documentation; Text/Corpus Linguistics

Meeting Description:

This is an open invitation public talk

Date: Tues, August 17, 2010 Location: Microsoft, Redmond, Washington, USA Room: Building 99, Room 1919 Time: 2:00PM-3:30PM Speaker: Jeff Allen

Directions to building 99


The majority of development work and deployment of machine translation (MT) technologies over the past several decades have been for international languages. Only a few projects for low-data/low-density/low resource/sparse-data/less-prevalent/lesser-commonly taught/minority languages have led to successful prototypes and products. There are a certain number of technical, logistical, social, educational and other factors which influence and impact the potential success of implementing systems for such languages. This talk will cover many of the lessons learned from previous projects, and some of the pitfalls to avoid. It will also demonstrate how the recent efforts for making Haitian Creole available for Haiti Disaster Relief had a certain level of success in record time because of the ability to build upon previous work. Yet, there were also obstacles with have been problematic and remain a concern for this language and for other less-prevalent languages. Lastly, the discussion will mention some ways to enable proactive, forward thinking projects, using some bootstrapping methods, to reduce the risk of situations which can result from working in a primarily reactive mode.

Jeff Allen is a member of the LINGUIST-List advisory group.

Page Updated: 05-Aug-2010