Language Classification Working Group
Marianne Mithun
I appreciated the
paper by Gary Simons and Peter Constable, Language Identification and IT:
Addressing Problems of Linguistic Diversity on a Global Scale, and find that it
raises just the kinds of issues that had occurred to me.
Language codes
The Ethnologue system looks optimal to me.
It is based on the most common criterion for delimiting languages: mutual
intellgibility. We know that this is not a simple matter, but it is what we
have. Languages known by different names in different areas are nevertheless
represented by a single code. The system is exhaustive and expandable: there is
room for many more distinct languages, both living and gone, than we will ever
recognize, even if some codes are retired when alternate divisions are made. I
myself appreciate the mnemonic character of most of the existing Ethnologue
codes. We all realize that not all codes can be perfectly mnemonic, simply
because the names of many languages sound alike and single languages are often
known by multiple names. What is the most mnemonic for one user may not be the
same as that for the next. But the principle of making the codes as mnemonic as
possible has considerable value. It is much like airport names: it is much
easier to learn and remember LAX for Los Angeles and SFO for San Francisco,
even if they are not perfect, than something like XQR and MBT. This
user-friendliness should lead to wider acceptance and result in fewer mistakes.
Considerable work has
already gone into establishing the Ethnolgue
codes. It would be silly for others to try to do it all over again. The team
responsible consists of good linguists dedicated to representing current
thinking among specialists. Furthermore, this team is in place to maintain the
system, continually updating it as more is known.
Coding schemes for language
classification
The kind of genetic
information associated with languages in the Ethnologue lists, that is, family and subgroups, as well as
alternate names, I find optimal and necessary. It would be a mistake to include
this as part of the language code itself, of course, because any change in
subgrouping would necessitate a change in language code. I was at first
intrigued by the LINGUIST coding scheme: it is good to see the hierarchical
information and layering, and important to be able to pull together information
from subgroups. I see two problems. I worry that the complete arbitrariness of
the alphabetic labels beyond the family (ATAACAB) will keep them from being
used and will result in errors of interpretation. A more serious problem is the
principle of naming the subgroups alphabetically. Every time a new subgroup is
recognized, the subgrouping codes for all languages not only in the new
subgroups but also for all of those in subgroups whose names occur later in the
alphabet will have to be altered.
Additional information
The locales in which
languages are spoken, and the estimated numbers of speakers furnished in the The Ethnologue is important and should
be accessible. This would be sufficient information for those interested in
areal traits, without introducing and forcing premature conclusions about areal
influences. I myself think that classifying languages by typological
similarities at this stage in our knowledge would be a serious mistake. Most
typologies currently oversimplify grossly (verb-initial or verb-final, head- or
dependent-marking, nominative/accusative or ergative/absolutive, pro-drop ...),
and such a specification would set those oversimplifications in stone.