LINGUIST List 12.3145

Wed Dec 19 2001

Qs: Japanese [b], OPR for IPA/Rosetta Project

Editor for this issue: Karen Milligan <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.


  1. Willian Xu, the japanese [b]
  2. Jim Mason, OCR for IPA: Rosetta Project

Message 1: the japanese [b]

Date: Wed, 19 Dec 2001 17:59:47 +0800 (CST)
From: Willian Xu <>
Subject: the japanese [b]

dear all,

while doing research on Chinese experimental phonetics, a Japanese
friend of mine pose a question: is the chinese [p] the same as the
Japanese [b]. It's rather clear that they are not equivalent, but i am
also confused when comparing with the french [b]. I feel that the
japanese one is between the other two. And my friend affirm that they
are different. But a sound between surd and sonant is a little
unreasonable. can any of the linguists working in japanese phonetics
help me? Thank you very much.

 william Xu
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: OCR for IPA: Rosetta Project

Date: Wed, 19 Dec 2001 16:42:44 -0800
From: Jim Mason <>
Subject: OCR for IPA: Rosetta Project

Does anyone have experience or suggestions for optical character
recognition software that will recognize IPA characters?

We are creating a new section to the Rosetta Project site
( that offers Swadesh lists in 1,000 languages,
with full search, sort and comparative capabilities. We intend to
provide each list in both native orthography(s) and IPA versions. We
are filling the database by scanning and OCRing preassembled lists
whenever possible, and thus need an OCR system that can recognize IPA

Suggestions for appropriate software, or interest in contributing
complete Swadesh lists should be directed to Jim Mason at



Rosetta Project Description:

The Rosetta Project is creating a broad corpus of language
descriptions, vernacular texts, analytic materials and audio files for
1,000+ languages in a publicly accessible, online archive. Our
intention is to create a meaningful survey and near permanent archive
of 1,000 languages as well as a unique platform for contemporary
comparative linguistic research and education. The text types we are
collecting for each language are explained in detail on the
site. (

We are creating this broad language archive through an open
contribution, open review process, similar to the strategy that
created the Oxford English Dictionary- though in this case, we hope
the Internet speeds the process a little bit. . . ;-) And to help the
process along, we are running collection efforts at Stanford,
Berkeley, Yale, SIL, and various linguistic organizations.

Most of the material in our database is excerpted from already
published materials, but we are also bringing some new material to
publication for the first time. In general, our interest is in
collecting, preserving, and making available the many riches of
descriptive linguistic work- work that is often difficult to access,
unorganized, or rotting away in shoe boxes without a proper home.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue