LINGUIST List 11.230

Thu Feb 3 2000

Qs: Feedback sought on languages listed in ISO 639

Editor for this issue: Scott Fults <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.


  1. Nicholas Ostler, Languages listed in ISO 639: feedback sought

Message 1: Languages listed in ISO 639: feedback sought

Date: Thu, 3 Feb 2000 16:02:28 +0000
From: Nicholas Ostler <>
Subject: Languages listed in ISO 639: feedback sought

Note by Nicholas Ostler:
I am forwarding this for a friend. Please send replies not to me but to:
John Clews <>

Languages listed in ISO 639: feedback sought

Dear list members. I am a member of the Joint Advisory Committee on
ISO 639: Codes for representation of names of languages (abbreviated
to ISO 639: language codes in further discussion below).

This committee meets on 17-18 February 2000 in Washington DC, and I
would be grateful for any information from list members, which would
highlight any major gaps in the languages listed below. The major
interest at this stage is in written languages.

If you are interested in the background to ISO 639, read section 1:
if not, and you would like to comment on the codes, and the languages
that have been coded, and any omissions or errors, go to section 2.

1. ISO 639: language codes

ISO 639 is one of many international standards developed by groups of
experts in many countries. This section provides a simplified view.

Put simply, ISO 639's job is to provide simple codes that can be
embedded in (mainly computerised) information systems that can allow
these information systems to highlight language use, or even to
enable useful things like font switching or similar, e.g. on Internet
web sites.

There are older 2-letter codes used, it could be said, mainly in
older, "legacy" system. 3-letter codes (mainly identical with codes
used by the Library of Congress, and in many libraries) have seen the
largest growth, and allow for greater expansion. Actually the two
sets are currently listed in two separate parts of ISO 639,
respectively in ISO [WD] 639[-1] and ISO 639-2.

The Internet Engineering Task Force's specification RFC 1766
recommends the use of ISO 639 codes in Internet uses.

We have also been in discussion with the Summer Institute of
Linguistics (SIL) who use a different (and much larger) set of
3-letter codes in their Ethnologue, codes which have also been used
in some Internet situations.

NB: (a) if you also use Ethnologue codes as well as, or instead of,
codes from ISO 639, I would be glad to hear, and to know in what
circumstances you use those codes.

(b) If you just use the Ethnologue, or some other reference source
where those Ethnologue codes are used, just for reference (i.e. the
reference source is important, not the codes themselves) that's a
different question.

I'd be grateful if you can distinguish (a) from (b) in any replies on
that point. However, regarding (b) it may be useful to know which
other publications/web sites use Ethnologue codes.

2. Opportunity for feedback

The list below is my own handy reference list based on my own
compilation of 3-letter codes from ISO 639-2 and Library of Congress
codes, and the 2-letter codes in ISO WD 639-1. Errors are likely to
be my own, rather than in ISO 639, though I have been fairly careful.

Some of the notes (especially those in square brackets, or using
asterisks) are just for my own reference, and relate only to
information on different editions of ISO 639,and can be ignored.

I'd be particularly interested to know of obvious omissions, or
errors in naming, or where predominant use of language names has

There are also some fairly basic "genetic codes" where entries for
"xxxx languages" or "xxxx languages (other)." Again, if some language
groups seem to have been omitted altogether I would be glad to know.

In any of the columns, any information including asterisks (*) or
question marks (?) are essentially my own, and for my own use.

If possible could you embed your comments within my quoted table,
unless your comment is very simple on a few lines: that will enable
me to allign comments.

I am primarily interested in omissions, and in language names, but if
you want to suggest a particularly useful 3-letter code, it may be
helpful (although I think that the default approach is likely to be
"use the first three letters of the name in English, or French, or in
the local spelling of the name" if known, allowing for transcription
or transliteration conventions).

Please could you reply direct to me at <>
and not on the list to avoid too many large repetitive emails
overcrowding the traffic on te list as a whole.

If you could reply within a week from reading this email, this is
likely to provide sufficient time to be able to feed such information
into the ISO 639 Joint Advisory Group meeting on 17-18 January 2000.

3. Handy reference list

Here's the list: I look forward to your comments!

[ Tip: Use a monospace font like Courier for the chart below ]

- ----------------------------------------------------------
 LC ISO 639-2 ISO 639-1 Language name in English
- ----------------------------------------------------------
 --- --- --- (aj) Abaza
 abk ab Abkhazian
 --- --- --- (ad) Adyge
 ace Achinese
 ach Acoli
 ada Adangme
 aar aa Afar
 afh Afrihili
 afr af Afrikaans
 afa Afro-Asiatic (Other)
 aka ak Akan
 akk Akkadian
 alb/sqi * sq Albanian
 ale Aleut
 alg Algonquian languages
 --- --- --- (an) Aragonese
 tut Altaic (Other)
 amh am Amharic
 apa Apache languages
 ara ar Arabic
 arc Aramaic
 arp Arapaho
 arn Araucanian (Mapuche)
 arw Arawak
 arm/hye * hy Armenian
 --- --- --- (vl) Aromanian; Arumanian
 art Artificial (Other)
 --- --- --- (ae) Arvanite
 asm as Assamese
 --- --- --- (au) Asturian
 ath Athapascan languages
 aus Australian languages
 map Austronesian (Other)
 ava av Avaric
 ave (fv) Avestan
 awa Awadhi
 aym ay Aymara
 aze az Azerbaijani
 ban Balinese
 --- --- --- (bq) Balkar
 bat Baltic (Other)
 bal Baluchi
 bam bm Bambara
 bai Bamileke languages
 bad Banda
 bnt Bantu (Other)
 bas Basa
 bak ba Bashkir
 baq/eus * eu Basque
 btk Batak (Indonesia)
 bej Beja
 bel Belarusian [was Byelorussian]
 bem Bemba
 ben bn Bengali
 ber Berber (Other)
 bho Bhojpuri
 bih bh Bihari
 bik Bikol
 bin Bini
 bis bi Bislama
 --- --- --- (bs) Bosnian
 bra Braj
 bre br Breton
 bug Bugis (Buginese)
 bul bg Bulgarian
 bua Buriat
 bur/mya * my Burmese
 cad Caddo
 car Carib
 cat ca Catalan
 cau Caucasian (Other)
 ceb Cebuano
 cel Celtic (Other)
 cai Central American Indian (Other)
 chg Chagatai
 cmc Chamic languages
 cha Chamorro
 --- --- --- (??) Chamorro
 che (nx) Chechen
 chr (jl) Cherokee
 chy Cheyenne
 chb Chibcha
 --- --- --- (ch) Chichewa; Chewa
 chi/zho * zh Chinese
 chn Chinook jargon
 chp Chipewyan
 cho Choctaw
 chu (sj) Church Slavic (Old Church Slavonic)
 tru chk Chuukese
 chv (cv) Chuvash
 cop Coptic
 cor kw Cornish
 cos co Corsican
 cre cr Cree
 mus Creek
 cpe Creoles & Pidgins, English
 cpf Creoles & Pidgins, French
 cpp Creoles & Pidgins, Portuguese
 crp Creoles & Pidgins (Other)
 scr/hrv * hr Croatian (Serbo-Croat, Latin)
 cus Cushitic (Other)
 cze/ces * cs Czech
 dak Dakota
 dan da Danish
 --- --- --- (dg) Dargwa
 day Dayak
 del Delaware
 din Dinka
 div dv Divehi
 doi Dogri
 dgr Dogrib
 dra Dravidian (Other)
 dua Duala
 dut/nld * nl Dutch
 dum Dutch, Middle (ca. 1050-1350)
 dyu Dyula
 dzo dz Dzongkha
 efi (ef) Efik
 egy Egyptian (Ancient)
 eka Ekajuk
 elx Elamite
 eng en English
 enm English, Middle (ca. 1100-1500)
 ang English, Old (ca. 450-1100)
 --- --- --- (er) Erzya Mordvin
 esp epo eo Esperanto
 esk --- -- ** Eskimo (Other) (not in 639-2)
 est et Estonian
 eth --- -- ** Ethiopic [languages] (not in 639-2)
 ewe ee Ewe
 ewo Ewondo
 fan Fang
 fat Fanti
 far fao fo Faroese
 fij fj Fijian
 fin fi Finnish
 fiu Finno-Ugrian (Other)
 fon Fon
 --- --- --- (fp) Franco-Proven=E7al
 fre/fra * fr French
 frm French, Middle (ca. 1400-1600)
 fro French, Old (842- ca. 1400)
 fri fry fy Frisian
 --- --- --- (??) Frisian, East; Sater Frisian
 --- --- --- (fn) Frisian, North (fn! - also in Persian, Old)
 fur (fu) Friulian
 ful ff Fulah
 gaa Ga
 gae gla gd Gaelic, Scots [* were gae/gdh]
 iri gle ga Gaelic, Irish [* were gai/iri]
 max glv gv Gaelic, Manx
 --- --- --- (gg) Gagauz
 gag glg gl Gallegan (Galician - used in Spain)
 lug lg Ganda
 gay Gayo
 gba Gbaya
 eth gez Geez
 geo/kat * ka Georgian
 ger/deu * de German
 gmh German, Middle High
 goh German, Old High (ca. 750-1050)
 gem Germanic (Other)
 --- --- --- (??) German, Low; Low German
 gil Gilbertese
 gon Gondi
 gor Gorontalo
 got Gothic
 grb Grebo
 grc Greek, Ancient (to 1453)
 gre/ell * el Greek, Modern (1453-)
 kal kl Greenlandic (Kalaallisut)
 gua grn gn Guarani
 guj gu Gujarati
 gwi Gwich'in
 hai Haida
 hau ha Hausa
 haw Hawaiian
 heb he *** Hebrew [Infoterm, 1989: iw deprecated?]
 her oh Herero
 hil Hiligaynon
 him Himachali
 hin hi Hindi
 hmo (??) Hiri Motu, Motu
 hit Hittite
 hmn Hmong
 hun hu Hungarian
 hup Hupa
 iba Iban
 ice/isl * is Icelandic
 ibo ig Igbo
 ijo Ijo
 ilo Iloko
 inc Indic (Other)
 ine Indo-European (Other)
 ind id *** Indonesian [Infoterm, 1989: in deprecated?]
 --- --- --- (ng) Ingush
 int ina ia Interlingua [* ]
 ile ie Interlingue [*note similar lanaguage name]
 iku iu Inuktitut [Infoterm, 1989]
 ipk ik Inupiaq (was Inupiak)
 ira Iranian (Other)
 sga Irish, Old (to 900)
 mga Irish, Middle (900 - 1200)
 iro Iroquoian languages
 --- --- --- (rx) Istro-Romanian
 ita it Italian
 jpn ja Japanese
 jav/jaw * jv/jw * Javanese [jw (Jawi?) now deprecated??]
 jrb Judeo-Arabic
 jpr Judeo-Persian
 --- --- --- (qb) Kabardian
 kab Kabyle
 kac Kachin
 kal Kalaallisut [renamed]
 --- --- --- (xl) Kalmyk
 kam Kamba
 kan kn Kannada
 kau kr Kanuri
 --- --- --- (qc) Karachay
 --- --- --- (qr) Karaim
 kaa Kara-Kalpak
 --- --- --- (kj) Karelian, North (Other Karelian too?)
 kar Karen
 kas ks Kashmiri
 --- --- --- (??) Kashubian
 kaw Kawi
 kaz kk Kazakh
 kha Khasi
 cam khm km ** Khmer (LC was once "cam")
 khi Khoisan (Other)
 kho Khotanese
 --- --- --- (ki) Kikuyu; Gikuyu
 kik Kikuyu
 kmb Kimbundu
 kin rw Kinyarwanda
 kir ky Kirghiz
 --- --- --- (kv) Komi
 kom Komi
 kon kg Kongo
 kok Konkani
 kor ko Korean
 kus kos Kosraean
 kpe Kpelle
 kro Kru
 kua ok Kuanyama [Kwanyama in 639-2]
 --- --- --- (qm) Kumyk
 kum Kumyk
 kur ku Kurdish
 kru Kurukh
 kut Kutenai
 --- --- --- (ld) Ladin
 lad Ladino
 --- --- --- (ly) Ladino
 lah Lahnda
 --- --- --- (lk) Lak
 lam Lamba
 lao lo Lao
 lat la Latin
 lav lv Latvian
 ltz lb Letzeburgesch
 lez (le) Lezghian
 lin ln Lingala
 lit lt Lithuanian
 --- --- --- (li) Livonian
 loz Lozi
 lub lu Luba-Katanga
 lua Luba-Lulua
 lui Luiseno
 lun Lunda
 luo Luo (Kenya and Tanzania)
 lus Lushai
 mac/mkd * mk Macedonian [*** mak earlier? NB Makasar]
 mad Madurese
 mag Magahi
 mai Maithili
 mak Makasar
 mla mlg mg Malagasy
 may/msa * ms Malay
 mal ml WD1* Malayalam
 mlt mt Maltese
 mdr Mandar
 man (md) Mandingo
 mni Manipuri
 mno Manobo languages
 mao/mri * mi Maori
 mar mr Marathi
 chm -- Mari
 --- --- --- (mj) Mari, Meadow
 --- --- --- (mm) Mari, Mountain
 mah (??) Marshall (Marshallese)
 mwr Marwari
 mas Masai
 myn Mayan languages
 men Mende
 mic Micmac
 min Minangkabau
 --- --- --- (??) Mingrelian
 mis Miscellaneous (Other)
 moh Mohawk
 --- --- --- (mh) Moksha Mordvin
 mol mo Moldavian
 mkh Mon-Kmer (Other)
 lol Mongo (Mongo-Nkundu)
 mon mn Mongolian
 mos Mossi (Moore (?) in LC list)
 mul Multiple languages
 mun Munda languages
 nah Nahuatl (LC listed earlier as Aztec)
 --- --- --- (ke) Nama
 nau na Nauru
 nav (dn) Navajo (Navaho)
 nde nd * Ndebele, North [nd=3D N. assumed]
 nbl Ndebele, South
 ndo on Ndonga
 --- --- --- (nt) Nenets
 nep ne Nepali
 new Newari
 nia Nias
 nic Niger-Kordofanian (Other)
 ssa Nilo-Saharan (Other)
 niu Niuean
 --- --- --- (nh) Nogai (Noghay)
 non Norse, Old
 nai North American Indian (Other)
 --- nor no Norwegian
 --- nno (nn) Norwegian - Nynorsk
 --- --- --- (nb) Norwegian - Bokm=E5l
 nub Nubian languages
 nym Nyamwezi
 nya (ny) Nyanja
 nyn Nyankole
 nyo Nyoro
 nzi Nzima
 lan oci Occitan (Langue d'Oc) (LC: post-500)
 oji oj Ojibwa
 ori or Oriya
 gal orm om ** Oromo (LC differs)
 osa Osage
 oss (ir) Ossetic (Ossetian)
 oto Otomian languages
 pal Pahlavi
 pau Palauan
 pli (pv) Pali
 pam Pampanga
 pag Pangasinan
 pan pa Panjabi
 pap Papiamento
 paa Papuan-Australian (Other)
 --- --- --- (fm) Persian, Middle
 per/fas * fa Persian
 peo (fn!) *** Persian, Old (ca 600 - 400 B.C.) (fn dupe!)
 phi Philippine (Other)
 phn Phoenician
 pol pl Polish
 pon Ponape (was this Pohnpeian too ???)
 por pt Portuguese
 pra Prakrit languages
 pro (pi) Provencal, Old (to 1500) (-1500 in ISO 639-1?)
 pus ps Pushto
 que qu Quechua
 raj Rajasthani
 rap Rapanui
 rar Rarotongan
 (qaa-qtz) (Reserved for local use)
 roh rm Rhaeto-Romance
 roa Romance (Other)
 rum/ron * ro Romanian
 --- --- --- (ry) Romany; Romani
 rom Romany
 run rn Rundi
 rus ru Russian
 --- --- --- (??) Ruthenian (Rusyn, Rusinian, Lemko)
 sal Salishan languages
 sam Samaritan Aramaic
 lap smi se Sami languages
 --- --- --- (sy) Sami, Inari
 --- --- --- (sz) Sami, Kildin
 --- --- --- (sx) Sami, Lule
 --- --- --- (ds?) Sami, Northern
 --- --- --- (sb) Sami, Skolt
 --- --- --- (sp) Sami, Southern
 sao smo sm Samoan
 sad Sandawe
 sag sg Sango
 san sa Sanskrit
 sat Santali
 srd (sc) Sardinian
 sas Sasak
 sco (ll) Scots, Lowlands (Lallans)
 sel Selkup
 sem Semitic (Other)
 scc/srp * sr Serbian (Serbo-Croat, Cyrillic)
 srr Serer
 shn Shan
 sho sna sn Shona
 sid Sidamo
 bla Siksika
 snd sd Sindhi
 snh sin si Sinhalese (Singhalese)
 sgn --- -- Sign languages [* not expanded further]
 sit Sino-Tibetan (Other)
 sio Siouan languages
 den Slave (Athapascan language)
 sla Slavic (Other)
 slo/slk * sk Slovak
 slv sl Slovenian
 sog Sogdian
 som so Somali
 son Songhai
 snk Soninke
 wen --- -- Sorbian languages (Wendish?)
 --- --- --- (sf) Sorbian, Lower
 --- --- --- (??) Sorbian, Upper
 nso Sotho, Northern
 sso sot st Sotho, Southern
 sai South American Indian (Other)
 spa --- * es Spanish [* were spa/esl; "esp" later!!!!]
 sun su Sudanese
 suk Sukuma
 sux Sumerian
 sus Susu
 swa sw Swahili
 swz ssw ss Swati (Swazi, Siswati, ?Siswant?)
 swe --- * sv Swedish [* ISO 639-2/T sve deprecated??]
 syr Syriac
 --- --- --- (tb) Tabasaran
 tag tgl tl Tagalog
 tah (??) Tahitian
 tai Tai (Other)
 taj tgk tg Tajik
 tmh Tamashek
 tam ta Tamil
 tar tat tt Tatar
 tel te Telugu
 ter Tereno (Terena)
 tet Tetum
 tha th Thai
 tib/bod * bo Tibetan
 tig Tigre
 tir ti Tigrinya
 tem Timne (Temne)
 tiv Tivi
 tli Tlingit
 tpi Tok Pisin
 tkl Tokelau
 tog to Tonga (Nyasa)
 ton Tonga (Tonga Islands)
 tru --- -- ** Truk (???????????)
 tsi Tsimshian
 tso ts Tsonga
 tsw tsn tn Tswana
 tum Tumbuka
 tur tr Turkish
 ota Turkish, Ottoman (1500 - 1928)
 tuk tk Turkmen
 tvl Tuvalu
 tyv Tuvinian
 twi tw Twi
 --- --- --- (um) Udmurt
 uga Ugaritic
 uig ug Uighur [Infoterm, 1989]
 ukr uk Ukrainian
 umb Umbundu
 und Undetermined
 urd ur Urdu
 uzb uz Uzbek
 vai Vai
 --- --- --- (??) Valencian
 ven ve Venda
 --- --- --- (vp) Veps
 vie vi Vietnamese
 vol vo Volapuk
 vot Votic
 wak Wakashan languages
 --- --- --- (wl) Walloon
 wal Walamo
 war Waray
 was Washo
 wel/cym * cy Welsh
 wol wo Wolof
 xho xh Xhosa
 sah Yakut
 yao Yao
 yap Yap (Yapese)
 --- --- --- (yy) Yi
 yid yi *** Yiddish [Infoterm, 1989: ji now deprecated?]
 yor yo Yoruba
 ypk Yupik languages
 znd Zande
 zap Zapotec
 zen Zenaga
 zha za Zhuang [Infoterm, 1989]
 zul zu Zulu
 zun Zuni
- ----------------------------------------------------------
 * highlights changes
 *** deprecations etc.
 ( ) tentative, mainly in ISO 639-1 draft
- ----------------------------------------------------------
*** In the web page: Code for the Representation of the Names of
 Languages. From ISO 639, revised 1989", there is the note on
 "Changes</a> made December 20, 1997, based upon information in
 the following note from a member of the W3C HTML group":

 "In 1989, the ISO 639 Registration Authority changed a number of
 codes as follows (the quote is taken from RFC 1766):

 The following codes have been added in 1989 (nothing later):
 ug (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang),
 he (Hebrew, *** replacing iw), yi (Yiddish, *** replacing ji),
 and id (Indonesian, replacing in)."

 3-letter dash codes ( --- ) below (and also 2-letter dash codes
 ( -- ) below) represent areas where there appears to be no code
 in the other code sources.

 In several cases, information on alternative language names are
 my own, assumed from comparing various lists.

Best regards

John Clews

John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: 0171 412 7826 (day); 0171 272 8397 (evening); 01423 888 432 (w/e)

Committee Chair of ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of CEN/TC304: Information and Communications
 Technologies: European Localization Requirements
Committee Member of the Foundation for Endangered Languages;
Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue