LINGUIST List 15.2190

Mon Aug 2 2004

Qs: Phonostatistical Characteristics;Named Entities

Editor for this issue: Naomi Fox <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at


  1. Yuri Tambovtsev, World languages and documentation
  2. Phan Xuan Hieu, Named entity recognition tools?

Message 1: World languages and documentation

Date: Fri, 30 Jul 2004 21:03:38 +0600
From: Yuri Tambovtsev <>
Subject: World languages and documentation

Dear LinguistList colleagues,

I have computed 163 world languages and their sound pictures. I would
like to gather more data on the common features of the sound chains in
language, as an entity. In particular, the phonostatistical
characteristics of American Indian or Australian languages.

Could someone please advise me whether it is necessary for me to go to
the USA or Australia to collect linguistic material for my research?
If so, where can it be found? Also, do the respective governments
provide grants for the documentation of endangered languages?

Looking forward to hearing from you.

Yours sincerely, 

Yuri Tambovtsev 

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Named entity recognition tools?

Date: Sat, 31 Jul 2004 02:57:06 +0900
From: Phan Xuan Hieu <>
Subject: Named entity recognition tools?

Dear all,

I am looking for a named entity recognizer that can identify several
NUMBER, PERCENT, MONEY, etc. I have tried GATE (ANNIE) and LingPipe
systems. The accuracy of GATE is good, however the XML-based output
format of GATE is too complex for subsequent processing. LingPipe
system is also good but it only recognize three kinds of entity

Could you please suggest me other NER systems (for English only) that
can recognize seven kinds of named entity (like MUC7 NER definition)?

Best regards,

Xuan Hieu Phan

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue