Editor for this issue: <>
> Does anybody know a speaker of or an expert on the Iranian language of > the USSR whose name in English is either Ossetic or Ossetian or > something like that? David Testen (Department of Linguistics, University of Chicago, 1010 East 59th Street, Chicago, IL 60637) works on Ossetian. He's not currently on the network, but if you wish you can e-mail him c/o me (drayMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuesapir.uchicago.edu). I believe Ladislav Zgusta at U. of Illinois also does work on this language. As for the name of the language, I've heard both forms. Testen uses "Ossetian," Zgusta uses "Ossetic" (I think). Perhaps one of them could tell you more about the distribution of the two names in the literature. Nancy L. Dray
In Linguist List 2.511, 9/13/91, Stan Kulikowski II (stankuliMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueuwf.bitnet) describes a heuristic method for doing language-identification (for text) based on "the hypothesis is that languages contain frequent small words (3 chars or less) which can be used to distinguish many lines of their text." He references "a number of replies relayed from usenet's sci.crypt that cryptographers use a method based on the frequency of bigram and trigram character sequences," and suggests that "this may work for file-sized data, but i doubt that it would be sensitive to a datum in the range of 40-80 bytes which is what you get in line-by-line text transfers." Although I have no experience in doing language identification on the type of text chunks in question, I do have some evidence to suggest that Kulikowski's speculation about sample size is wrong. The text-to-speech system developed in my former group at AT&T Bell Laboratories uses trigram statistics to guess the ethnic origin of unknown proper names (or rather, of unknown words which it guesses to be proper names), so that it can use appropriate letter-to-sound conventions in guessing the pronunciation. This method is described in U.S. Patent 4,829,580, "Text Analysis System with Letter Sequence Recognition and Speech Stress Assignment Arrangement," held by Ken Church. Names, obviously, are much shorter than 40-80 bytes --- typical unknown names are more like 6-12 bytes --- but the method works fairly well. I suspect that a method of this kind, if appropriately trained, would categorize text lines quite accurately. In addition to simply trying the experiment, which is easy enough, one could predict the performance as a function of sample size on the basis of the frequency distributions involved. N-gram letter distributions for different languages using the same alphabet are probably different enough for a statistical pattern-recognition method to work quite well on a sample size of 40-80 characters. Mark Liberman University of Pennsylvania
As a non-native speaker of English, I should have probably lurked out on this discussion. But one advantage of non-nativeness is that you have to develop numerous minitheories in lieu of the dead native speaker's live intuition. My theory of the -ite/-ist busines, for whatever it's worth, has been that X+ite is primarily a noun and means 'a camp follower of X,' while X+ist is both a noun and an adjective, and much more comfortable as the latter than X+ite, and means an intellectual, religious, etc. affinity. The derogatory meaning of X+ite follows from the camp-follower meaning, at least for some people. -- Victor Raskin raskinMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuej.cc.purdue.edu
I would have guessed that `just in case' was introduced by a philosopher going out of his way to use simple, everyday language and to avoid locutions judged excessively technical (if and only if). Another `nontechnical' rendering of `iff' is `precisely when', but that brings in possible confusion with time....Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue