Message 1: Summary: Subordinating Conjunctions

Date: Wed, 27 Aug 1997 11:22:21 -0700
From: Ken Litkowski <>
Subject: Summary: Subordinating Conjunctions

I provide a summary of responses to a request made on LINGUIST,
CORPORA, and E-LEX for information on subordinating conjunctions.
First, I repeat the call, then summarize.


"I am performing a "definitive" analysis of the meanings of
subordinating conjunctions and would be interested in linking up with
anyone who has focused on their representation in NLP systems. I am
performing an analysis of subordinating conjunction definitions in
Webster's 3rd International Dictionary, modeling these definitions
using the theory of labeled directed graphs (digraphs), using
principles for identifying primitives I have previously described (see
Litkowski, K. C. (1988). On the search for semantic
primitives. Computational Linguistics, 14(1), 52 for an overview).

The "meaning" of subordinating conjunctions essentially consists of
labeling clauses and establishing discourse relationships of time,
contingency, place, condition, concession, contrast, reason, purpose,
and result (see Quirk et al. pp. 1070-1112). I am aware that
subordinating conjunctions are used as cue words in discourse
processing, but I am not aware of any systematic bringing together of
these "meanings" in a computational system. Characterizing these
meanings is important in the digraph analysis, and while I can do it
myself, it would be preferable not to reinvent the wheel. I would be
grateful if anyone can point to computational representations of these

A database of these "meanings" will eventually be made publicly
available on the web for anyone to use."


Many respondents noted correctly that another term which subsumes
subordinating conjunctions (SCs) is "discourse markers," for which
there is a substantial literature. Megan Duque-Estrada has a very
extensive bibliography of this literature available on the web at This literature provides substantial
information pertinent to my request.

More specific information going to the heart of my request for
features and semantic labels associated with SCs was provided by Alex
Eulenberg, Ken Barker, and Alistair Knott. Mary Dee Harris provided
the link to Ali's work; I am very grateful for this link.

Alistair Knott (,
particularly "A Data-Driven Method for Classifying Connective
Phrases") and Alex Eulenberg (,
providing features for additive conjunctive sentence adverbials)
provide an identification of features associated with comprehensive
lists of "cue phrases" (that go beyond the smaller set of SCs). The

Ken Barker (, particularly
"Interactive semantic analysis of clause-level relationships, CLRs)
provides an identification of semantic labels assigned to clauses
based on CLR markers or clausal connectives. The semantic

These two sets of information respond precisely to my needs and are
quite useful in their own right.

Another respondent (who prefers to remain anonymous), who has
investigated diachronic processes (including SCs in German), noted a
possibly very interesting point that Old High German had no
Complementizer Phrases, so that "das" evolved into "dass". This
person also cited work describing subordinating conjunctions with
prepositions (like: bevor, nachdem, indem) having a descriptive part
(-vor-, nach-) and a referential part (expressed by the d-words or the
w-words), including pairs like "nachdem -wonach", "dadurch dass -
wodurch", and "damit - womit". There is thus the suggestion (to me,
at least) that the use of subordinating conjunctions over time might
reflect an evolutionary process of reasoning where particular feature
values and semantic relationships have become lexicalized. The
characterization of feature values and semantic relationships by
Knott, Eulenberg, and Barker may facilitate this type of diachronic

When I complete the first phase of my research (the initial digraph
analysis), I will provide notification of its availability. Then,
when I fully incorporate the analysis of features and semantic
relationships, I will post the data on the ACL SIGLEX Lexical
Resources page (

I thank everyone who responded and hope that this summary responds to
those who asked to be kept informed of my findings.


Ken Litkowski TEL.: 301-926-5904
CL Research EMAIL:
20239 Lea Pond Place 
Gaithersburg, MD 20879-1270 USA Home Page:
Message 2: reply to respondents on LS-, LZ-

Date: Wed, 27 Aug 1997 15:22:46 -0400 (EDT)
From: Victor Mair <>
Subject: reply to respondents on LS-, LZ-

	On August 8, 1997, I posted a question about the existence of
_ls-_ and _lz-_ (i.e., a liquid plus a fricative in that order)
configurations on the Linguist List electronic bulletin board. I
stated that such articulations seemed to me to be phonologically
improbable and that they might naturally metathesize to _zl-_, _sl-_,
etc., or that, if they did occur, they would be highly marked. Among
those who kindly replied to my query were Victor Peppard via Jacob
Caflisch (University of South Florida), Mark Liberman (University of
Pennsylvania), Peter Chew (Oxford University), David Robertson
(tincan), John E. Koontz (Boulder), Subhadra Ramachandran (cyantic),
Robert Beard (Bucknell University), Sondra Ahlen (cmu), James Giangola
(General Magic), Christopher Miller (University of Quebec), Colin
Whiteley (Barcelona), Ronald Cosper (Saint Mary's University,
Halifax), Alain Theriault (University of Montreal), Jakob Dempsey
(Yuan-ze University, Taiwan), Kimmo Huovila (kielikone, Finnland),
Michael Betsch (Tuebingen University), Sandra Paoli (University of
York, England), Mark Donohue (United Kingdom), David Gohre (Indiana),
Geoffrey Sampson (University of Sussex), James Kirchner (no address or
affiliation), Olga Shaumyan (University of Sussex), Steve Seegmiller
(Montclair State University), Manaster (probably Alexis Manaster
Ramer, Michigan), Paul Boersma (Instituut voor Fonetische
Wetenschappen, Amsterdam), Wolfgang Behr (Frankfurt University), Keith
Goeringer (University of California at Berkeley), Heli Harrikari
(University of Helsinki), Charles Gribble (OSU), and Elena Andonova
(Bulgaria[?]). Several graduate students at the University of
California (Los Angeles) and elsewhere requested that their names not
be listed in my response because they did not want to get in trouble
with their adviers for spending too much time on the Internet. I hope
that I have not inadvertently forgotten any others. My profound
gratitude is due to each and every one who responded.
	The gist of the information which the above-named individuals
provided to me is that there certainly do exist _ls_, _lz_, and
similar configurations, even in English (e.g., "else," "holster,"
"also," "balsam," "pulse," "calcium," "dulcimer," "bells," "pulls,"
"files," and "celsius"), but note that these are all internal or
final. Other languages with internal _-ls-_, _-lz-_, etc. (often
separated in two adjoining syllables) cited in the responses include
Coast Salish, Malayalam, Bulgarian, French, Spanish, Portuguese,
Italian, and Finnish. It was reported that some Athapaskan languages
may have such clusters in final position. As indicated by the dashes,
however, I was thinking of syllable initial _ls-_, _lz-_, etc.; it
would appear that such articulations are quite rare throughout the
	Levantine and Western dialects of Arabic (including Maltese)
were mentioned among the replies I received, although without
indication of the location (initial, internal, or final) of these
consonant combinations and without citation of specific words. Also
mentioned was the mysterious language Lvova, said to be from the Santa
Cruz Islands, Solomons, and written about by Wurm in articles for
numerous Pacific linguistics publications. The languages of the
Caucasus were noted as being particularly rich in initial consonant
clusters, but _ls-_ and _lz-_ were not cited specifically.
	The overwhelming preponderance of the citations for such
configurations were from Slavic languages, in which some of my
correspondents declared that virtually any combination of consonants
is possible! (For example, there is a Russian word, _vzbzdnut'_,
which you will not find in any dictionary, that means "to emit a
silent but very smelly fart." And Czech, amazingly, even has whole
words that are spelled without any vowels, although out of
physiological necessity a kind of epenthetic schwa is used when they
are pronounced. Geoffrey Sampson cites the Czech word _vlh_ ["wolf"]
which consists wholly of an _-l-_ sound surrounded by fricatives on
both sides [the _-h_ in this word is actually a voiceless velar
fricative, IPA [x]]!) As Victor Peppard put it, "One of the reasons
Slavic has so many complex consonant clusters is that in about the
ninth century Common Slavic lost a pair of semi-vowels, one back and
one front, precipitating in a lot of places, to put it colloquially, a
tremendous collision of consonants." Nonetheless, even in Slavic,
_ls_ or _lsh_ and _lz_ or _lzh_ are usually found intervocalically,
but are much less common (and HARDER TO PRONOUNCE) in initial position
(cf. _lzh-_ ["false"], _lze_ ["possible"], etc.). Often, as with
Russian _l'stit_ ("to flatter") and _l'viny_ ("lion's"), an initial
_l-_ in such combinations tends to become palatalized, perhaps to ease
	The difficulty of pronouncing syllable initial _ls-_, _lz-_,
was commented upon by Sondra Ahlen as follows: "In that case I would
not be surprised to see some phonological process occur since as I
recall syllable initial sequences tend to involve increasing levels of
sonority as you get closer to the nucleus, with the common exception
of fricatives before stops as in _str-_. Metathesis is one of several
phonological processes that might affect an underlying syllable
initial (or potentially syllable initial) such as _lz-_, _ls-_. Other
options might include vowel epenthesis, consonant deletion,
syllabification of the liquid, etc."
	Paul Boersma cited one instance of metathesis in Czech:
_ml-ha_ ("fog," two syllables, the /_l_/ being syllabic) from an older
_mgla_ which still exists in Polish.
	A check of all the roots beginning with _l-_ in the
_Etimologicheskii Slovar' Slavyanskikh Yaz'ikov_, vols. 15-17,
revealed that whenever the _l-_ was not followed by a vowel (i.e.,
when it was followed by something other than a vowel), the letter to
be found was either the hard or the soft sign, both of which I presume
indicate some sort of palatalization or yodization. My interpretation
of this pattern would be that it reflects a phonological process
designed to ease the pronunciation of the following consonant
(including _-s-_ , _-z-_, and _-zh-_) after the _l-_.
	Jakob Dempsey provided extremely valuable data from Tibetan which 
lends support for the possibility of metathesis: "Old Tibetan 'moon' was 
_*sla_ which assimilated to _zla_ in the classical period, but in the 
western dialects this underwent initial-cluster metathesis (seen in many 
examples of western Tibetan): _zla_ > _lza_. That form remains in the 
extreme west (Balti), but in central Tibet we have: _nda_ < _lda_ which 
in turn seems to come from _lza_. It has been proposed that _lce_ 
('tongue') came from _*sle_ (via _*lse_), but since there are still 
dialects in Tibetan which preserve _cle_, this is yet another example of 
that metathesis, with the _c-_ in _cle_ probably a palatization of 
earlier _*tle_ which in turn may be from _*ple_, cf. Drung _p-lai_ (Drung 
has many old loans from Tibetan). 'Tongue' in many other Tibeto-Burman 
languages is from _*ble_."
	Wolfgang Behr observed that "Qiangic [a Tibeto-Burman language 
found in Sichuan Province of China] allows _rp-, rk-, rt-, rb-, rg-, 
rts-, rdz-, rtsh-, rdzh-, rdzh-, rm-, rng-, rl-_ [!!], _rw_ (with 
distinctive syllabic and non-syllabic _r-_), but no _*ls-_ or _*lz-_ 
(neither _*rs-_ or _*rz-_). Jiarong [another Tibeto-Burman language from 
the same area of southwest China], although equipped with one of the most 
curious initial cluster systems known (> 170 types), has such things as 
_ltsh-, ldz-, ldzh-, lj-_, but again, no _*ls_ or _*lz-_." As for the 
anomalous distribution of preinitial resonants in Written Tibetan (e.g., 
<_rts_> but not *<_lts_>, etc.), this phenomenon has apparently never 
been explained in the literature. It is not known for sure whether these 
clusters were ever pronounced as they were written in the Old Tibetan and 
Pre-Tibetan periods (we may notice the great variation of written cluster 
representations in the Dunhuang documents), or if they were pronounced 
sesquisyllabically, or if the preinitials came into being as mere 
graphical conventions marking tone. Similar clusters, violating not only 
basic sonority hierarchy restrictions but even such notions as 
Hjelmslev's "resolvability principle" (i.e., every language L that allows 
C1C2C3- ititials of a given shape in its phonotactics must allow for all 
adjacent subsets of the cluster, viz., C1C2-, C2C3), have been set up for 
Old Sinitic by "proto-form stuffers" (to use James A. Matisoff's term). 
Those who have done so, again quoting Matisoff, lack an adequate 
	Finally, Wolfgang Behr also offered some very interesting 
theoretical perspectives, complete with an extensive bibliography, 
concerning the "sonority sequencing principle" (SSP) and its violations. 
A basic assumption of the SSP is that the least sonorant segments occur 
toward the margins of a syllable. Among the finer differentiations of 
the sonority scale are those proposed by Th. Vennemann in his _Preference 
Laws for Syllable Structure_ (Berlin: Mouton, 1988). According to the 
sonority restrictions applying to the distribution of segments in a 
syllable on Vennemann's fine-grained scale, predictions may be made about 
statistical frequencies or markedness properties. By these standards, 
_ls-_ and _lz-_ would have to be classified as marked.

Victor H. Mair Dept. of Asian & Middle Eastern Studies
University of Pennsylvania
Philadelphia, PA 19104-6305

Tel.: 215-898-8432
Fax.: 215-573-9617
e-mail: (read once or twice a week)

