Editor for this issue: Ann Dizdar <dizdar
tam2000.tamu.edu>
Dear Collegues, This is the second summary about the languages with no delimiters (e.g., space) for word boundaries. Many people sent me valuable information. I appreciate following contributers: Shanley Allen <allenMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuempi.nl> Rita Bhandari <bhandari
semlab1.sbs.sunysb.edu> Doug Cooper <doug
chulkn.car.chula.ac.th> Peter Daniels <pdaniels
press-gopher.uchicago.edu> Stefan Frisch <frisch
babel.ling.nwu.edu> Keith Goeringer <keg
violet.berkeley.edu> Mark Hansell-Mai Hansheng <mhansell
carleton.edu> Susantha Herath <herath
u-aizu.ac.jp> Matthew Hurst <matth
cogsci.ed.ac.uk> Wolfram Kahl <kahl
hermes.informatik.unibw-muenchen.de> Jee Eun Kim <jeeeunk
microsoft.com> Hiroaki Kitano <6500hiro
ucsbuxa.ucsb.edu> Wenchao Li <wcli
vax.ox.ac.uk> Stuart Luppescu <sl70
musuko.spc.uchicago.edu> Duncan MacGregor <aa735
freenet.carleton.ca> Stavros Macrakis <macrakis
osf.org> Philippe Mennecier <ferry
cimrs1.mnhn.fr> Boris Fridman Mintz <fridman
ucol.mx> Nicholas Ostler <nostler
chibcha.demon.co.uk> Peter Paul <Peter.Paul
arts.monash.edu.au> Gnani Perinpanayagam <gnani
sun3.oulu.fi> Ellen F. Prince <ellen
central.cis.upenn.edu> Steve Seegmiller <SEEGMILLER
apollo.montclair.edu> Dan I. Slobin <slobin
cogsci.Berkeley.EDU> Jan-Olof Svantesson <Jan-Olof.Svantesson
ling.lu.se> Allan C Wechsler <Wechsler
world.std.com> I had a problem to classify languages into two groups: ones which have delimiters for words, others don't. Some languages don't have delimiters, but eventually words are separable by superficial cue succh as letter form as seen in Arabic. Other languages are opposite - it does have delimiters, but virtually we need more analysis to get "reasonable"(I know it's vague!) units - words are so long because of glueing the elements, like Tamil. I understand that this glueing and typological agglutinating (or poly- synthesizing) are different matter. But, it may have some correlation between them. Could someone tell me what is the typological class (Agglutinating, Polysynthesis, etc.) of languages of "NO" and "Partly NO"? - let's ignore gray zone, and consider only storong or typical ones! I got an impression that Devangari-based languages in "Partly NO" group are Agglutinating languages. Isn't it correct? (I know Japanese are Agglutinating, and Chinese is Isolating. Sanskrit is inflecting, isn't it?) So, I finally decided to classify in four groups: "NO delimiters", "Partly NO", "Virtually YES", and "YES, it has delimiters". (I did't consider as YES, or Virtually YES for languages which are segmentable to morphemes by every character like in Chinese, because I wanted bigger lexical units than morphemes.) I will submit the final summary next time. If you find errors in this list, or some special comment please send a message directly to me. Especially, I'm afraid of misclassification between "Partly NO" and "YES". Here is the list: ======================================================================= Q: Does the language have word-boundary delimiters? A.[NO]:(3) Chinese, Japanese, Tibetan B.[Partly NO - Words delimited, but need analysis to reach lexical level] (7) Devanagari-based: Burmese, Khmer, Lao(?), Malayalam(?), Sanskrit, Tamil, Thai C.[Vertually YES - Easily distinguishable by character form] (8) Arabic-based: Arabic, Dari, Kurdish(*1*), Malay, Pashto, Persian(Farsi), Sinhi, Urdu D.[YES]: (133) Latin/Greek-baed: (89) Acholi, Afrikaans, Akan(Twi), Balinese, Bambara, Bantu, Basque, Berber, Breton, Buluba-Lulua, Caddoan, Catalan, Chikaranga, Chippewa(Ojibwa), Choctaw, Cree, Croatian, Czech, Dakota(Sioux), Danish, Dutch, English, (Esperanto), Estonian, Ewe, Fijian, Filipino, Finnish, Flemish, French, Fulani(Fulbe), Gaelic, Gaelic, German, Greek, Guarani, Harari, Hausa, Hawaiian, Hungarian, Icelandic, Igbo, Indonesian, Iroquoian, Italian, Javanese, Kanuri, Khasi, Kongo, Lappish, Latvian, Lithuanian, Lu-Ganda, Makua, Malagasy, Malay, Maltese, Mandingo, Maori, Mapudungu, Masai, Moldavian(*2*), Nyanja, Nama, Navajo, Norwegian, Polish, Portuguese, Quechua, Rhaeto-Romantic, Romanian, Romany, Samoan, Sundanese, Sangs, Slovak, Slovene, Somali, Spanish, Swahili, Swedish, Tagalog, Turkish, Turkmen(*2*), Uzbek(*2*), Vietnamese, Welsh, Yoruba, Zulu Cyrillic-baed: (26) Avar, Azerbaijani, Bashkhir, Belorussian, Bulgarian, Buryat, Chechen, Chuvash, Kabardian, Kalmyk, Kazakh, Kirghiz, Komi, Mari, Macedonian, Mongolian(*3*), Nivkh, Russian, Ossetian, Sebian, Serbo-Croat, Tajik, Tatar, Udmurt, Ukrainian, Yakut Hebrew: (3) Hebrew(modern), Ladino(Judio-Spanish), Yeddish Devanagari-based: (7) Assamese, Bengali, Hindi, Nepali, Telugu, Sinhalese Others: (7) Amharic(Ethiopian)(?), Armenian(modern), Cherokee, Georgian, Inuktitut(Eskimo), Korean, Punjabi ? (1): Manchu *1* Kurdish also uses Cyrllic, Roman and Armenian. *2* Moldavian, Turkmen, Uzbek used Cyrillic until recently. *3* Mongolian has both Uigur-derived script and Cyrillic as official. Following are languages which I don't have data yet: Buginese, Kannada, Kashmirti, Lahnda, Marathi ============================================================================== - Hideo Fujii U. of Massachusetts