Workshop on
The Digitization of Language Data:
The Need for Standards

Working Group on
Language Classification










New: Working Group Responses

The fact that more and more digitized language data is becoming available on the Web has confronted linguistics with a new challenge. In order to identify and retrieve this data, we need a system for referring to the relevant languages in ways that a machine can understand. We need, in fact, a system of codes—one which makes all the distinctions which linguists need, but also has the stability and lack of ambiguity required by computational systems. This system should include both:

  • A system of unique codes for individual languages.
  • A system of unique codes for subgroups and language families—one which ideally reflects a consistent, generally-accepted system of language classification.

Unfortunately, to our knowledge, no such system exists. But in order to implement the E-MELD project, LINGUIST will have to adopt some language coding system. So we would like to ask for your advice, and your feedback on our provisional plans. Briefly, these are to adopt the Ethnologue codes for individual languages. And, since the Ethnologue has no coding scheme for a genetic classification of languages, to use a system of our own for language classification within the LINGUIST database. But we would very much like to know:

  • Are there other viable alternatives (i.e. other language coding systems) which we should consider?
  • How usable will linguists—particularly experts in the various language families—find the proposed coding systems

Thus we would like to ask you, the members of the Language Classification Working Group, to:

  • Read, by way of background, a short paper by Gary Simons and Peter Constable, Language Identification and IT: Addressing Problems of Linguistic Diversity on a Global Scale, which compares the Ethnologue codes with the standard (ISO-639) set forth by the International Standards Organization .
  • Look at (a) the Ethnologue codes for the language group you know best or (b) our coding scheme for language classification, and decide whether these are systems you could live with or whether they could be significantly improved*
  • Send us your observations and conclusions in the form of a brief (1 page or less) report. If you can send your report to Helen Aristar-Dry by June 14, we would like to put it on the web so that the working group can read it before the workshop. Otherwise, we would ask you to bring 12 copies to the workshop. These reports will be the springboard for discussion in the working group sessions.

*Or, alternatively, you could tackle some of the even thornier questions surrounding the choice of language codes, including what kind of administrative system we need to set up in order to be able to change the coding as new classifications are adopted.   

Whether or not you choose to address the "thorny questions," please take the time to follow the link above so that the group will have a mutual context in which to begin a discussion. This link offers brief explanations of the difficulties with the ISO-639 standard and of the goals of the LINGUIST subgroup-coding system, as well as raising other questions that the work group should discuss.


Workshop homepage | Workshop Proposal | Advance Reading | Contact the Organizers