LINGUIST List 2.530

Tue 17 Sep 1991

Disc: Ossetic, Lg identification, Chomskyite, 'In case'

Editor for this issue: <>


Directory

  1. Nancy L. Dray, Ossetian/Ossetic
  2. Mark Liberman, Identifying what language a text line is
  3. Victor Raskin, Chomskyite
  4. , More on `just in case'

Message 1: Ossetian/Ossetic

Date: Fri, 13 Sep 91 13:15:03 CDT
From: Nancy L. Dray <draysapir.uchicago.edu>
Subject: Ossetian/Ossetic
> Does anybody know a speaker of or an expert on the Iranian language of
> the USSR whose name in English is either Ossetic or Ossetian or
> something like that?
David Testen (Department of Linguistics, University of Chicago,
1010 East 59th Street, Chicago, IL 60637) works on Ossetian.
He's not currently on the network, but if you wish you can
e-mail him c/o me (draysapir.uchicago.edu).
I believe Ladislav Zgusta at U. of Illinois also does work on this
language. As for the name of the language, I've heard both forms.
Testen uses "Ossetian," Zgusta uses "Ossetic" (I think). Perhaps
one of them could tell you more about the distribution of the two
names in the literature.
	Nancy L. Dray
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Identifying what language a text line is

Date: Fri, 13 Sep 91 18:44:45 EDT
From: Mark Liberman <mylunagi.cis.upenn.edu>
Subject: Identifying what language a text line is
In Linguist List 2.511, 9/13/91, Stan Kulikowski II (stankuliuwf.bitnet)
describes a heuristic method for doing language-identification (for text)
based on "the hypothesis is that languages contain frequent small words
(3 chars or less) which can be used to distinguish many lines of their text."
He references "a number of replies relayed from usenet's sci.crypt that
cryptographers use a method based on the frequency of bigram and trigram
character sequences," and suggests that "this may work for file-sized data,
but i doubt that it would be sensitive to a datum in the range of 40-80 bytes
which is what you get in line-by-line text transfers."
Although I have no experience in doing language identification on the
type of text chunks in question, I do have some evidence to suggest
that Kulikowski's speculation about sample size is wrong. The
text-to-speech system developed in my former group at AT&T Bell
Laboratories uses trigram statistics to guess the ethnic origin of
unknown proper names (or rather, of unknown words which it guesses to
be proper names), so that it can use appropriate letter-to-sound
conventions in guessing the pronunciation. This method is described in
U.S. Patent 4,829,580, "Text Analysis System with Letter Sequence
Recognition and Speech Stress Assignment Arrangement," held by Ken
Church.
Names, obviously, are much shorter than 40-80 bytes --- typical
unknown names are more like 6-12 bytes --- but the method works fairly
well. I suspect that a method of this kind, if appropriately trained,
would categorize text lines quite accurately. In addition to simply
trying the experiment, which is easy enough, one could predict the
performance as a function of sample size on the basis of the frequency
distributions involved. N-gram letter distributions for different
languages using the same alphabet are probably different enough for a
statistical pattern-recognition method to work quite well on a sample
size of 40-80 characters.
		Mark Liberman
		University of Pennsylvania
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Chomskyite

Date: Mon, 16 Sep 91 14:09:05 EST
From: Victor Raskin <raskinj.cc.purdue.edu>
Subject: Chomskyite
As a non-native speaker of English, I should have probably lurked out
on this discussion. But one advantage of non-nativeness is that you
have to develop numerous minitheories in lieu of the dead native
speaker's live intuition. My theory of the -ite/-ist busines, for
whatever it's worth, has been that X+ite is primarily a noun and means
'a camp follower of X,' while X+ist is both a noun and an adjective,
and much more comfortable as the latter than X+ite, and means an
intellectual, religious, etc. affinity. The derogatory meaning of
X+ite follows from the camp-follower meaning, at least for some
people.
--
Victor Raskin raskinj.cc.purdue.edu
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 4: More on `just in case'

Date: Mon, 16 Sep 91 11:17:39 EDT
From: <macrakisosf.org>
Subject: More on `just in case'
I would have guessed that `just in case' was introduced by a
philosopher going out of his way to use simple, everyday language and
to avoid locutions judged excessively technical (if and only if).
Another `nontechnical' rendering of `iff' is `precisely when', but
that brings in possible confusion with time....
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue