LINGUIST List 6.1278

Wed Sep 20 1995

Sum: Languages with No Between-word Delimiters

Editor for this issue: Helen Dry <hdryemunix.emich.edu>


Directory

  1. Hideo Fujii, 2nd Summary: Languages with no between-word delimiters

Message 1: 2nd Summary: Languages with no between-word delimiters

Date: Tue, 19 Sep 1995 13:23:16 2nd Summary: Languages with no between-word delimiters
From: Hideo Fujii <fujiimackay.cs.umass.edu>
Subject: 2nd Summary: Languages with no between-word delimiters


Dear Collegues,

This is the second summary about the languages with no delimiters (e.g.,
space) for word boundaries. Many people sent me valuable information.
I appreciate following contributers:

	 Shanley Allen <allenmpi.nl>
	 Rita Bhandari <bhandarisemlab1.sbs.sunysb.edu>
	 Doug Cooper <dougchulkn.car.chula.ac.th>
	 Peter Daniels <pdanielspress-gopher.uchicago.edu>
	 Stefan Frisch <frischbabel.ling.nwu.edu>
	 Keith Goeringer <kegviolet.berkeley.edu>
	 Mark Hansell-Mai Hansheng <mhansellcarleton.edu>
	 Susantha Herath <herathu-aizu.ac.jp>
	 Matthew Hurst <matthcogsci.ed.ac.uk>
	 Wolfram Kahl <kahlhermes.informatik.unibw-muenchen.de>
	 Jee Eun Kim <jeeeunkmicrosoft.com>
	 Hiroaki Kitano <6500hiroucsbuxa.ucsb.edu>
	 Wenchao Li <wclivax.ox.ac.uk>
	 Stuart Luppescu <sl70musuko.spc.uchicago.edu>
	 Duncan MacGregor <aa735freenet.carleton.ca>
	 Stavros Macrakis <macrakisosf.org>
	 Philippe Mennecier <ferrycimrs1.mnhn.fr>
	 Boris Fridman Mintz <fridmanucol.mx>
	 Nicholas Ostler <nostlerchibcha.demon.co.uk>
	 Peter Paul <Peter.Paularts.monash.edu.au>
	 Gnani Perinpanayagam <gnanisun3.oulu.fi>
	 Ellen F. Prince <ellencentral.cis.upenn.edu>
	 Steve Seegmiller <SEEGMILLERapollo.montclair.edu>
	 Dan I. Slobin <slobincogsci.Berkeley.EDU>
	 Jan-Olof Svantesson <Jan-Olof.Svantessonling.lu.se>
	 Allan C Wechsler <Wechslerworld.std.com>


I had a problem to classify languages into two groups: ones which have
delimiters for words, others don't. Some languages don't have delimiters,
but eventually words are separable by superficial cue succh as letter
form as seen in Arabic. Other languages are opposite - it does have
delimiters, but virtually we need more analysis to get "reasonable"(I know
it's vague!) units - words are so long because of glueing the elements,
like Tamil.

I understand that this glueing and typological agglutinating (or poly-
synthesizing) are different matter. But, it may have some correlation
between them. Could someone tell me what is the typological class
(Agglutinating, Polysynthesis, etc.) of languages of "NO" and "Partly NO"?
- let's ignore gray zone, and consider only storong or typical ones!

I got an impression that Devangari-based languages in "Partly NO" group
are Agglutinating languages. Isn't it correct?
(I know Japanese are Agglutinating, and Chinese is Isolating. Sanskrit
is inflecting, isn't it?)

So, I finally decided to classify in four groups: "NO delimiters", "Partly NO",
"Virtually YES", and "YES, it has delimiters". (I did't consider as YES, or
Virtually YES for languages which are segmentable to morphemes by every
character like in Chinese, because I wanted bigger lexical units than
morphemes.)

I will submit the final summary next time. If you find errors in this list,
or some special comment please send a message directly to me.
Especially, I'm afraid of misclassification between "Partly NO" and "YES".

Here is the list:
=======================================================================
Q: Does the language have word-boundary delimiters?
 A.[NO]:(3) Chinese, Japanese, Tibetan

 B.[Partly NO - Words delimited, but need analysis to reach lexical level]
 (7)	 Devanagari-based:
	 Burmese, Khmer, Lao(?), Malayalam(?), Sanskrit, Tamil, Thai

 C.[Vertually YES - Easily distinguishable by character form]
 (8)	 Arabic-based:
	 Arabic, Dari, Kurdish(*1*), Malay, Pashto, Persian(Farsi),
	 Sinhi, Urdu

 D.[YES]: (133)
 Latin/Greek-baed:
 (89) Acholi, Afrikaans, Akan(Twi), Balinese, Bambara, Bantu, Basque,
	 Berber, Breton, Buluba-Lulua, Caddoan, Catalan, Chikaranga,
	 Chippewa(Ojibwa), Choctaw, Cree, Croatian, Czech, Dakota(Sioux),
	 Danish, Dutch, English, (Esperanto), Estonian, Ewe, Fijian,
	 Filipino, Finnish, Flemish, French, Fulani(Fulbe), Gaelic,
	 Gaelic, German, Greek, Guarani, Harari, Hausa, Hawaiian,
	 Hungarian, Icelandic, Igbo, Indonesian, Iroquoian, Italian,
	 Javanese, Kanuri, Khasi, Kongo, Lappish, Latvian, Lithuanian,
	 Lu-Ganda, Makua, Malagasy, Malay, Maltese, Mandingo, Maori,
	 Mapudungu, Masai, Moldavian(*2*), Nyanja, Nama, Navajo, Norwegian,
	 Polish, Portuguese, Quechua, Rhaeto-Romantic, Romanian, Romany,
	 Samoan, Sundanese, Sangs, Slovak, Slovene, Somali, Spanish,
	 Swahili, Swedish, Tagalog, Turkish, Turkmen(*2*), Uzbek(*2*),
	 Vietnamese, Welsh, Yoruba, Zulu
 Cyrillic-baed:
 (26) Avar, Azerbaijani, Bashkhir, Belorussian, Bulgarian, Buryat,
	 Chechen, Chuvash, Kabardian, Kalmyk, Kazakh, Kirghiz, Komi, Mari,
	 Macedonian, Mongolian(*3*), Nivkh, Russian, Ossetian,
	 Sebian, Serbo-Croat, Tajik, Tatar, Udmurt, Ukrainian,
	 Yakut
 Hebrew:
 (3) Hebrew(modern), Ladino(Judio-Spanish), Yeddish
 Devanagari-based:
 (7) Assamese, Bengali, Hindi, Nepali, Telugu, Sinhalese
 Others:
 (7) Amharic(Ethiopian)(?), Armenian(modern), Cherokee, Georgian,
	 Inuktitut(Eskimo), Korean, Punjabi
 ? (1): Manchu

*1* Kurdish also uses Cyrllic, Roman and Armenian.
*2* Moldavian, Turkmen, Uzbek used Cyrillic until recently.
*3* Mongolian has both Uigur-derived script and Cyrillic as official.


Following are languages which I don't have data yet:
 Buginese, 		Kannada, 		Kashmirti,
 Lahnda,		Marathi

 ==============================================================================
- Hideo Fujii
 U. of Massachusetts
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue