LINGUIST List 6.1302

Fri Sep 22 1995

Sum: Languages with no between-word delimiters (final sum)

Editor for this issue: Ann Dizdar <dizdartam2000.tamu.edu>


Directory

  1. Hideo Fujii, Final SUM/Q: languages with no between-word delimiters

Message 1: Final SUM/Q: languages with no between-word delimiters

Date: Fri, 22 Sep 1995 08:52:51 Final SUM/Q: languages with no between-word delimiters
From: Hideo Fujii <fujiimackay.cs.umass.edu>
Subject: Final SUM/Q: languages with no between-word delimiters


Dear LINGUISTs & NLPASIAns,

Thank you very much for sending a lot of valuable information. This is
the final summary about the languages which don't have delimiters between
'words'. I am tempted to send all comments, but I give up because of the
troublesome amount. And also, I eliminated the languages which do have
delimiters [YES group] as the same reason. If you are interested in keeping
it at your hand, please consult the previous summary (LINGUIST VOL-6-1269).
To update as I did, you can just 1) remove Flemish, 2)add West-Frisian
(according to Henk Wolf. Thank you.) to [YES]:Latin/Greek variations; and
3) move Turkish, Kazakh(??), Azerbaijani(??), Uzbek(??), Kirghiz(??) and
Turkmen(??) feom [YES] to [Partly NO]. Here, (??) indicates a "closely akin"
language to Turkish which produces VERY long words by agglutination and
no space inside it (also by Henk Wolf).

I don't have any definite conclusions, but following are observed:

 1) Delimiter-less languages are minority in the world languages.
 Especially [Yes] group are very rare. There are only 3 languages
 (Chinese, Japanese Tibettan) - 2% of 158 languages. If we include
 [Partly NO] group (like many indian languages), they are 6% of
 total 158 languages. ["(?)" is counted as 0.5].
 (Chinese, Japanese & Tibettan, all three have script to write
 (traditionally) in Up-to-Down direction, but they do so even now with
 some extent. This may be a factor of this result, but I'm not
 sure....)

 2) There is NO strong correlation between delimiter-less-ness and
 language typology: We can observe various types of languages
 in [YES]/[Partly NO] group, e.g., agglutinating (Japanese, Tamil,
 Turkish, etc.), isolating (Chinese), and inflectional (Sanskrit, etc.).
 (How about the polisynthetic??)
 Also, a language type can be observed in both [YES] and [(Partly)NO],
 such as sanskrit([Prtly NO]) and Russian([YES]) for inflecting.
 Between [NO] and [YES] (or at least [Partly NO]), the same holds
 for agglutinating and isolating. For example, Chinese[NO] and
 Vietnamese[YES]; Japanese[NO] and Hungalian[YES] (or Tamil[Partly NO]).

 3) So, it is NOT quite right to say (and often I listened) that
 "the language L does not have a space between words because
 L is agglutinating."

 4) Latin/Greek and Cyrillic-based languages are big majority
 (70% in our list; 54% Latin/Greek, 16% Cyrillic), and they
 have space as a delimiter between words. It seems no exceptions
 in modern languages. (But many exceptions in classic/medival languages.)


Several people suggested that some languages have 'moderately' long
(verbal/nominal) compounds (e.g. German, Dutch, West-Frisian, etc.)
(vs. above languages with VERY long words).
I am not sure that these compounds are non-lexical, i.e., productive and
semantically transparent (i.e., syntactic compounds).

Could someone tell me if these German etc. have prominently syntactic compounds
to make the word "pretty long"? Or, are they mostly lexical compounds?
Also, if you know some other languages in our list have this (i.e., syntax
compounds are prominent) property, please just send the name of the language.
I will make and post a summary of this new question as a different topic again.

We have still many (?)-items in our list. So, if you are knowledgeable
about these, please let me know (I will wait for in a long run...). I will
send an addendum to this summary some time later.

Finally, I want to express my sincere gratitude to our (37) contributers
to compose this final summary. The name of contributers are listed at
the end of this summary. (I hope I didn't miss any name. If it happens,
I sincerely apologize.)

Hideo Fujii
University of Massachusetts
 at Amherst


SUMMARY: Languages Without Delimiters Between 'Words'
(in total 158 languages)
==========================================================
Q: Does the language have word-boundary delimiters?
 A.[NO]:(3) Chinese, Japanese, Tibetan

 B.[Partly NO -Words delimited, but need analysis to reach lexical level]:(14)
	 Latin/Greek Variations:
	 Turkish, Turkmen(*2*)(??), Uzbek(*2*)(??)
 Cyrillic-baed:
 Azerbaijani(??), Kazakh(??), Kirghiz(??)
	 Devanagari Variations:
	 Burmese, Khmer, Lao(?), Sanskrit, Thai
	 Others:
	 Kannada(?), Malayalam(?), Tamil

 C.[Vertually YES - Easily distinguishable by character form]: (10)
 Arabic Variations: (10)
 D.[YES]: (131)
 Latin/Greek Variations: (86)
 Cyrillic Variations: (25)
 Hebrew Variations: ( 3)
 Devangari Variations: ( 8)
 Others: ( 9)

*1* Kurdish also uses Cyrllic, Roman and Armenian.
*2* Moldavian, Turkmen, Uzbek, Mongolian used (or still is using) Cyrillic
 until recently.

List of Contributers
====================
Shanley Allen <allenmpi.nl>
the Babesther <hanminerva.cis.yale.edu>
Rita Bhandari <bhandarisemlab1.sbs.sunysb.edu>
Doug Cooper <dougchulkn.car.chula.ac.th>
Peter Daniels <pdanielspress-gopher.uchicago.edu>
Boris Fridman Mintz <fridmanucol.mx>
Stefan Frisch <frischbabel.ling.nwu.edu>
Hideo Fujii <fujiimackay.cs.umass.edu>
Keith Goeringer <kegviolet.berkeley.edu>
Henry Groover <hgrooverqualitas.com>
Mark Hansell-Mai Hansheng <mhansellcarleton.edu>
Susantha Herath <herathu-aizu.ac.jp>
Matthew Hurst <matthcogsci.ed.ac.uk>
Hiroaki Kitano <6500hiroucsbuxa.ucsb.edu>
Wolfram Kahl <kahlhermes.informatik.unibw-muenchen.de>
Jee Eun Kim <jeeeunkmicrosoft.com>
Wenchao Li <wclivax.ox.ac.uk>
Stuart Luppescu <sl70musuko.spc.uchicago.edu>
Greg Lyons <lcgalmahidol.ac.th>
Duncan MacGregor <aa735freenet.carleton.ca>
Stavros Macrakis <macrakisosf.org>
James Magnuson <magnusonpsych.rochester.edu>
Mark A. Mandel <Markccgate.dragonsys.com>
Alec McAllister <ECL6TAMlucs-01.novell.leeds.ac.uk>
Philippe Mennecier <ferrycimrs1.mnhn.fr>
Nicholas Ostler <nostlerchibcha.demon.co.uk>
Peter Paul <Peter.Paularts.monash.edu.au>
Gnani Perinpanayagam <gnanisun3.oulu.fi>
Ellen F. Prince <ellencentral.cis.upenn.edu>
Steve Seegmiller <SEEGMILLERapollo.montclair.edu>
Dan I. Slobin <slobincogsci.Berkeley.EDU>
Achim Stenzel <achimtiger.toppoint.de>
Jan-Olof Svantesson <Jan-Olof.Svantessonling.lu.se>
Joseph Tomei <jtomeililim.ilcs.hokudai.ac.jp>
Shravan Vasishth <shravanlisa.lang.osaka-u.ac.jp>
Allan C Wechsler <Wechslerworld.std.com>
Henk Wolf <H.A.Y.Wolfstud.let.ruu.nl>
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue