LINGUIST List 11.230

Thu Feb 3 2000

Qs: Feedback sought on languages listed in ISO 639

Editor for this issue: Scott Fults <scottlinguistlist.org>




We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.

Directory

  • Nicholas Ostler, Languages listed in ISO 639: feedback sought

    Message 1: Languages listed in ISO 639: feedback sought

    Date: Thu, 3 Feb 2000 16:02:28 +0000
    From: Nicholas Ostler <nostlerchibcha.demon.co.uk>
    Subject: Languages listed in ISO 639: feedback sought


    Note by Nicholas Ostler: I am forwarding this for a friend. Please send replies not to me but to: John Clews <Endangersesame.demon.co.uk>

    Languages listed in ISO 639: feedback sought

    Dear list members. I am a member of the Joint Advisory Committee on ISO 639: Codes for representation of names of languages (abbreviated to ISO 639: language codes in further discussion below).

    This committee meets on 17-18 February 2000 in Washington DC, and I would be grateful for any information from list members, which would highlight any major gaps in the languages listed below. The major interest at this stage is in written languages.

    If you are interested in the background to ISO 639, read section 1: if not, and you would like to comment on the codes, and the languages that have been coded, and any omissions or errors, go to section 2.

    1. ISO 639: language codes

    ISO 639 is one of many international standards developed by groups of experts in many countries. This section provides a simplified view.

    Put simply, ISO 639's job is to provide simple codes that can be embedded in (mainly computerised) information systems that can allow these information systems to highlight language use, or even to enable useful things like font switching or similar, e.g. on Internet web sites.

    There are older 2-letter codes used, it could be said, mainly in older, "legacy" system. 3-letter codes (mainly identical with codes used by the Library of Congress, and in many libraries) have seen the largest growth, and allow for greater expansion. Actually the two sets are currently listed in two separate parts of ISO 639, respectively in ISO [WD] 639[-1] and ISO 639-2.

    The Internet Engineering Task Force's specification RFC 1766 recommends the use of ISO 639 codes in Internet uses.

    We have also been in discussion with the Summer Institute of Linguistics (SIL) who use a different (and much larger) set of 3-letter codes in their Ethnologue, codes which have also been used in some Internet situations.

    NB: (a) if you also use Ethnologue codes as well as, or instead of, codes from ISO 639, I would be glad to hear, and to know in what circumstances you use those codes.

    (b) If you just use the Ethnologue, or some other reference source where those Ethnologue codes are used, just for reference (i.e. the reference source is important, not the codes themselves) that's a different question.

    I'd be grateful if you can distinguish (a) from (b) in any replies on that point. However, regarding (b) it may be useful to know which other publications/web sites use Ethnologue codes.

    2. Opportunity for feedback

    The list below is my own handy reference list based on my own compilation of 3-letter codes from ISO 639-2 and Library of Congress codes, and the 2-letter codes in ISO WD 639-1. Errors are likely to be my own, rather than in ISO 639, though I have been fairly careful.

    Some of the notes (especially those in square brackets, or using asterisks) are just for my own reference, and relate only to information on different editions of ISO 639,and can be ignored.

    I'd be particularly interested to know of obvious omissions, or errors in naming, or where predominant use of language names has changed.

    There are also some fairly basic "genetic codes" where entries for "xxxx languages" or "xxxx languages (other)." Again, if some language groups seem to have been omitted altogether I would be glad to know.

    In any of the columns, any information including asterisks (*) or question marks (?) are essentially my own, and for my own use.

    If possible could you embed your comments within my quoted table, unless your comment is very simple on a few lines: that will enable me to allign comments.

    I am primarily interested in omissions, and in language names, but if you want to suggest a particularly useful 3-letter code, it may be helpful (although I think that the default approach is likely to be "use the first three letters of the name in English, or French, or in the local spelling of the name" if known, allowing for transcription or transliteration conventions).

    Please could you reply direct to me at <Endangersesame.demon.co.uk> and not on the list to avoid too many large repetitive emails overcrowding the traffic on te list as a whole.

    If you could reply within a week from reading this email, this is likely to provide sufficient time to be able to feed such information into the ISO 639 Joint Advisory Group meeting on 17-18 January 2000.

    3. Handy reference list

    Here's the list: I look forward to your comments!

    [ Tip: Use a monospace font like Courier for the chart below ]

    - ---------------------------------------------------------- LC ISO 639-2 ISO 639-1 Language name in English - ---------------------------------------------------------- --- --- --- (aj) Abaza abk ab Abkhazian --- --- --- (ad) Adyge ace Achinese ach Acoli ada Adangme aar aa Afar afh Afrihili afr af Afrikaans afa Afro-Asiatic (Other) aka ak Akan akk Akkadian alb/sqi * sq Albanian ale Aleut alg Algonquian languages --- --- --- (an) Aragonese tut Altaic (Other) amh am Amharic apa Apache languages ara ar Arabic arc Aramaic arp Arapaho arn Araucanian (Mapuche) arw Arawak arm/hye * hy Armenian --- --- --- (vl) Aromanian; Arumanian art Artificial (Other) --- --- --- (ae) Arvanite asm as Assamese --- --- --- (au) Asturian ath Athapascan languages aus Australian languages map Austronesian (Other) ava av Avaric ave (fv) Avestan awa Awadhi aym ay Aymara aze az Azerbaijani ban Balinese --- --- --- (bq) Balkar bat Baltic (Other) bal Baluchi bam bm Bambara bai Bamileke languages bad Banda bnt Bantu (Other) bas Basa bak ba Bashkir baq/eus * eu Basque btk Batak (Indonesia) bej Beja bel Belarusian [was Byelorussian] bem Bemba ben bn Bengali ber Berber (Other) bho Bhojpuri bih bh Bihari bik Bikol bin Bini bis bi Bislama --- --- --- (bs) Bosnian bra Braj bre br Breton bug Bugis (Buginese) bul bg Bulgarian bua Buriat bur/mya * my Burmese cad Caddo car Carib cat ca Catalan cau Caucasian (Other) ceb Cebuano cel Celtic (Other) cai Central American Indian (Other) chg Chagatai cmc Chamic languages cha Chamorro --- --- --- (??) Chamorro che (nx) Chechen chr (jl) Cherokee chy Cheyenne chb Chibcha --- --- --- (ch) Chichewa; Chewa chi/zho * zh Chinese chn Chinook jargon chp Chipewyan cho Choctaw chu (sj) Church Slavic (Old Church Slavonic) tru chk Chuukese chv (cv) Chuvash cop Coptic cor kw Cornish cos co Corsican cre cr Cree mus Creek cpe Creoles & Pidgins, English cpf Creoles & Pidgins, French cpp Creoles & Pidgins, Portuguese crp Creoles & Pidgins (Other) scr/hrv * hr Croatian (Serbo-Croat, Latin) cus Cushitic (Other) cze/ces * cs Czech dak Dakota dan da Danish --- --- --- (dg) Dargwa day Dayak del Delaware din Dinka div dv Divehi doi Dogri dgr Dogrib dra Dravidian (Other) dua Duala dut/nld * nl Dutch dum Dutch, Middle (ca. 1050-1350) dyu Dyula dzo dz Dzongkha efi (ef) Efik egy Egyptian (Ancient) eka Ekajuk elx Elamite eng en English enm English, Middle (ca. 1100-1500) ang English, Old (ca. 450-1100) --- --- --- (er) Erzya Mordvin esp epo eo Esperanto esk --- -- ** Eskimo (Other) (not in 639-2) est et Estonian eth --- -- ** Ethiopic [languages] (not in 639-2) ewe ee Ewe ewo Ewondo fan Fang fat Fanti far fao fo Faroese fij fj Fijian fin fi Finnish fiu Finno-Ugrian (Other) fon Fon --- --- --- (fp) Franco-Proven=E7al fre/fra * fr French frm French, Middle (ca. 1400-1600) fro French, Old (842- ca. 1400) fri fry fy Frisian --- --- --- (??) Frisian, East; Sater Frisian --- --- --- (fn) Frisian, North (fn! - also in Persian, Old) fur (fu) Friulian ful ff Fulah gaa Ga gae gla gd Gaelic, Scots [* were gae/gdh] iri gle ga Gaelic, Irish [* were gai/iri] max glv gv Gaelic, Manx --- --- --- (gg) Gagauz gag glg gl Gallegan (Galician - used in Spain) lug lg Ganda gay Gayo gba Gbaya eth gez Geez geo/kat * ka Georgian ger/deu * de German gmh German, Middle High goh German, Old High (ca. 750-1050) gem Germanic (Other) --- --- --- (??) German, Low; Low German gil Gilbertese gon Gondi gor Gorontalo got Gothic grb Grebo grc Greek, Ancient (to 1453) gre/ell * el Greek, Modern (1453-) kal kl Greenlandic (Kalaallisut) gua grn gn Guarani guj gu Gujarati gwi Gwich'in hai Haida hau ha Hausa haw Hawaiian heb he *** Hebrew [Infoterm, 1989: iw deprecated?] her oh Herero hil Hiligaynon him Himachali hin hi Hindi hmo (??) Hiri Motu, Motu hit Hittite hmn Hmong hun hu Hungarian hup Hupa iba Iban ice/isl * is Icelandic ibo ig Igbo ijo Ijo ilo Iloko inc Indic (Other) ine Indo-European (Other) ind id *** Indonesian [Infoterm, 1989: in deprecated?] --- --- --- (ng) Ingush int ina ia Interlingua [* ] ile ie Interlingue [*note similar lanaguage name] iku iu Inuktitut [Infoterm, 1989] ipk ik Inupiaq (was Inupiak) ira Iranian (Other) sga Irish, Old (to 900) mga Irish, Middle (900 - 1200) iro Iroquoian languages --- --- --- (rx) Istro-Romanian ita it Italian jpn ja Japanese jav/jaw * jv/jw * Javanese [jw (Jawi?) now deprecated??] jrb Judeo-Arabic jpr Judeo-Persian --- --- --- (qb) Kabardian kab Kabyle kac Kachin kal Kalaallisut [renamed] --- --- --- (xl) Kalmyk kam Kamba kan kn Kannada kau kr Kanuri --- --- --- (qc) Karachay --- --- --- (qr) Karaim kaa Kara-Kalpak --- --- --- (kj) Karelian, North (Other Karelian too?) kar Karen kas ks Kashmiri --- --- --- (??) Kashubian kaw Kawi kaz kk Kazakh kha Khasi cam khm km ** Khmer (LC was once "cam") khi Khoisan (Other) kho Khotanese --- --- --- (ki) Kikuyu; Gikuyu kik Kikuyu kmb Kimbundu kin rw Kinyarwanda kir ky Kirghiz --- --- --- (kv) Komi kom Komi kon kg Kongo kok Konkani kor ko Korean kus kos Kosraean kpe Kpelle kro Kru kua ok Kuanyama [Kwanyama in 639-2] --- --- --- (qm) Kumyk kum Kumyk kur ku Kurdish kru Kurukh kut Kutenai --- --- --- (ld) Ladin lad Ladino --- --- --- (ly) Ladino lah Lahnda --- --- --- (lk) Lak lam Lamba lao lo Lao lat la Latin lav lv Latvian ltz lb Letzeburgesch lez (le) Lezghian lin ln Lingala lit lt Lithuanian --- --- --- (li) Livonian loz Lozi lub lu Luba-Katanga lua Luba-Lulua lui Luiseno lun Lunda luo Luo (Kenya and Tanzania) lus Lushai mac/mkd * mk Macedonian [*** mak earlier? NB Makasar] mad Madurese mag Magahi mai Maithili mak Makasar mla mlg mg Malagasy may/msa * ms Malay mal ml WD1* Malayalam mlt mt Maltese mdr Mandar man (md) Mandingo mni Manipuri mno Manobo languages mao/mri * mi Maori mar mr Marathi chm -- Mari --- --- --- (mj) Mari, Meadow --- --- --- (mm) Mari, Mountain mah (??) Marshall (Marshallese) mwr Marwari mas Masai myn Mayan languages men Mende mic Micmac min Minangkabau --- --- --- (??) Mingrelian mis Miscellaneous (Other) moh Mohawk --- --- --- (mh) Moksha Mordvin mol mo Moldavian mkh Mon-Kmer (Other) lol Mongo (Mongo-Nkundu) mon mn Mongolian mos Mossi (Moore (?) in LC list) mul Multiple languages mun Munda languages nah Nahuatl (LC listed earlier as Aztec) --- --- --- (ke) Nama nau na Nauru nav (dn) Navajo (Navaho) nde nd * Ndebele, North [nd=3D N. assumed] nbl Ndebele, South ndo on Ndonga --- --- --- (nt) Nenets nep ne Nepali new Newari nia Nias nic Niger-Kordofanian (Other) ssa Nilo-Saharan (Other) niu Niuean --- --- --- (nh) Nogai (Noghay) non Norse, Old nai North American Indian (Other) --- nor no Norwegian --- nno (nn) Norwegian - Nynorsk --- --- --- (nb) Norwegian - Bokm=E5l nub Nubian languages nym Nyamwezi nya (ny) Nyanja nyn Nyankole nyo Nyoro nzi Nzima lan oci Occitan (Langue d'Oc) (LC: post-500) oji oj Ojibwa ori or Oriya gal orm om ** Oromo (LC differs) osa Osage oss (ir) Ossetic (Ossetian) oto Otomian languages pal Pahlavi pau Palauan pli (pv) Pali pam Pampanga pag Pangasinan pan pa Panjabi pap Papiamento paa Papuan-Australian (Other) --- --- --- (fm) Persian, Middle per/fas * fa Persian peo (fn!) *** Persian, Old (ca 600 - 400 B.C.) (fn dupe!) phi Philippine (Other) phn Phoenician pol pl Polish pon Ponape (was this Pohnpeian too ???) por pt Portuguese pra Prakrit languages pro (pi) Provencal, Old (to 1500) (-1500 in ISO 639-1?) pus ps Pushto que qu Quechua raj Rajasthani rap Rapanui rar Rarotongan (qaa-qtz) (Reserved for local use) roh rm Rhaeto-Romance roa Romance (Other) rum/ron * ro Romanian --- --- --- (ry) Romany; Romani rom Romany run rn Rundi rus ru Russian --- --- --- (??) Ruthenian (Rusyn, Rusinian, Lemko) sal Salishan languages sam Samaritan Aramaic lap smi se Sami languages --- --- --- (sy) Sami, Inari --- --- --- (sz) Sami, Kildin --- --- --- (sx) Sami, Lule --- --- --- (ds?) Sami, Northern --- --- --- (sb) Sami, Skolt --- --- --- (sp) Sami, Southern sao smo sm Samoan sad Sandawe sag sg Sango san sa Sanskrit sat Santali srd (sc) Sardinian sas Sasak sco (ll) Scots, Lowlands (Lallans) sel Selkup sem Semitic (Other) scc/srp * sr Serbian (Serbo-Croat, Cyrillic) srr Serer shn Shan sho sna sn Shona sid Sidamo bla Siksika snd sd Sindhi snh sin si Sinhalese (Singhalese) sgn --- -- Sign languages [* not expanded further] sit Sino-Tibetan (Other) sio Siouan languages den Slave (Athapascan language) sla Slavic (Other) slo/slk * sk Slovak slv sl Slovenian sog Sogdian som so Somali son Songhai snk Soninke wen --- -- Sorbian languages (Wendish?) --- --- --- (sf) Sorbian, Lower --- --- --- (??) Sorbian, Upper nso Sotho, Northern sso sot st Sotho, Southern sai South American Indian (Other) spa --- * es Spanish [* were spa/esl; "esp" later!!!!] sun su Sudanese suk Sukuma sux Sumerian sus Susu swa sw Swahili swz ssw ss Swati (Swazi, Siswati, ?Siswant?) swe --- * sv Swedish [* ISO 639-2/T sve deprecated??] syr Syriac --- --- --- (tb) Tabasaran tag tgl tl Tagalog tah (??) Tahitian tai Tai (Other) taj tgk tg Tajik tmh Tamashek tam ta Tamil tar tat tt Tatar tel te Telugu ter Tereno (Terena) tet Tetum tha th Thai tib/bod * bo Tibetan tig Tigre tir ti Tigrinya tem Timne (Temne) tiv Tivi tli Tlingit tpi Tok Pisin tkl Tokelau tog to Tonga (Nyasa) ton Tonga (Tonga Islands) tru --- -- ** Truk (???????????) tsi Tsimshian tso ts Tsonga tsw tsn tn Tswana tum Tumbuka tur tr Turkish ota Turkish, Ottoman (1500 - 1928) tuk tk Turkmen tvl Tuvalu tyv Tuvinian twi tw Twi --- --- --- (um) Udmurt uga Ugaritic uig ug Uighur [Infoterm, 1989] ukr uk Ukrainian umb Umbundu und Undetermined urd ur Urdu uzb uz Uzbek vai Vai --- --- --- (??) Valencian ven ve Venda --- --- --- (vp) Veps vie vi Vietnamese vol vo Volapuk vot Votic wak Wakashan languages --- --- --- (wl) Walloon wal Walamo war Waray was Washo wel/cym * cy Welsh wol wo Wolof xho xh Xhosa sah Yakut yao Yao yap Yap (Yapese) --- --- --- (yy) Yi yid yi *** Yiddish [Infoterm, 1989: ji now deprecated?] yor yo Yoruba ypk Yupik languages znd Zande zap Zapotec zen Zenaga zha za Zhuang [Infoterm, 1989] zul zu Zulu zun Zuni - ---------------------------------------------------------- * highlights changes *** deprecations etc. ( ) tentative, mainly in ISO 639-1 draft - ---------------------------------------------------------- *** In the web page: Code for the Representation of the Names of Languages. From ISO 639, revised 1989", there is the note on "Changes</a> made December 20, 1997, based upon information in the following note from a member of the W3C HTML group":

    "In 1989, the ISO 639 Registration Authority changed a number of codes as follows (the quote is taken from RFC 1766):

    The following codes have been added in 1989 (nothing later): ug (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew, *** replacing iw), yi (Yiddish, *** replacing ji), and id (Indonesian, replacing in)."

    3-letter dash codes ( --- ) below (and also 2-letter dash codes ( -- ) below) represent areas where there appears to be no code in the other code sources.

    In several cases, information on alternative language names are my own, assumed from comparing various lists.

    Best regards

    John Clews

    - John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG tel: 0171 412 7826 (day); 0171 272 8397 (evening); 01423 888 432 (w/e) Email: Endangersesame.demon.co.uk

    Committee Chair of ISO/TC46/SC2: Conversion of Written Languages; Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization; Committee Member of CEN/TC304: Information and Communications Technologies: European Localization Requirements Committee Member of the Foundation for Endangered Languages; Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets