![]() |
Workshop on | |
| The
Digitization of Language Data: The Need for Standards |
||
|
Working
Group on |
||
|
|
||
|
The fact that more and more digitized language data is becoming available on the Web has confronted linguistics with a new challenge. In order to identify and retrieve this data, we need a system for referring to the relevant languages in ways that a machine can understand. We need, in fact, a system of codes—one which makes all the distinctions which linguists need, but also has the stability and lack of ambiguity required by computational systems. This system should include both:
|
||
|
Unfortunately, to our knowledge, no such system exists. But in order to implement the E-MELD project, LINGUIST will have to adopt some language coding system. So we would like to ask for your advice, and your feedback on our provisional plans. Briefly, these are to adopt the Ethnologue codes for individual languages. And, since the Ethnologue has no coding scheme for a genetic classification of languages, to use a system of our own for language classification within the LINGUIST database. But we would very much like to know:
Thus we would like to ask you, the members of the Language Classification Working Group, to:
*Or, alternatively, you could tackle some of the even thornier questions surrounding the choice of language codes, including what kind of administrative system we need to set up in order to be able to change the coding as new classifications are adopted. Whether or not you choose to address the "thorny questions," please take the time to follow the link above so that the group will have a mutual context in which to begin a discussion. This link offers brief explanations of the difficulties with the ISO-639 standard and of the goals of the LINGUIST subgroup-coding system, as well as raising other questions that the work group should discuss. | ||