Sun Aug 13 1995

Qs: Phonemicity of writing

I would like to get some estimate of what percentage of the world's
written languages are represented orthographically in a phonemic
manner. More specifically, how many written languages are such that
one can predict the phonological properties of a word --- including
stress, accent or tone --- merely by consulting the string of symbols
used to write that word, and without further information, such as the
morphological structure of the word? For a language whose writing
system is largely phonemic, one could write down a set of rules for
word pronunciation, and in the ideal case the number of rules would be
within an order of magnitude of the number of graphemes. (A few
lexical exceptions don't matter, as long as there aren't hundreds of
them.) I am leaving the sense of `phoneme' intentionally vague:
normally a phonemic written representation implies that one can
predict the surface phonemic representation from the written form of
the word, but I would be perfectly happy considering a system to be
phonemic if some more abstract level of phonological representation
were represented, from which the surface phonemic representation could
be predicted by regular phonological rules/principles. (I should also
note, to clarify the question further, that I am interested primarily
in the correspondence between the written form and the spoken form for
the the standard variet(y,ies) of the language, which the written form
presumably reflects to some degree: I am not interested (at the
moment) in dialects of the language which deviate to varying degrees
from the standard.)

So, under this definition, Spanish would presumably count as very
phonemic since one can nearly always predict the pronunciation of a
word, including its stress, from the orthography. Romanian is less
phonemic since while the actual set of phonemes in a word is mostly
determinable by the set of graphemes used (with the representation of
glides being slight source of complication), the placement of stress
requires some knowledge of the morphological class of the word
(following work of Ioana Chitoran). English is presumably among the
least phonemic, since the `regular rules' of pronunciation are
themselves quite complex, and there are many lexical exceptions.

The particular classification of the writing system as logographic,
moraic or segmental is unimportant: in principle Chinese writing could
be classed as phonemic (albeit with a rather large set of graphemes),
but for the fact that especially among the more common characters
there are quite a few with pronunciation ambiguities which can only be
resolved using lexical information.

I am familiar with several of the recent books on writing systems: but
while these typically contain in-depth analyses of particular systems,
as far as I can tell, nobody has done a survey of this kind. (If on
the contrary, someone can point me to a survey that answers this
question, I would be most grateful.) So, I would be very interested
in getting as much information related to this question on as many
languages as people are sufficiently familiar with. I think I already
know the answer to these questions for the more familiar Western
European languages (including some less familiar ones like Irish and
Welsh), as well as Romanian, Russian, Hebrew, Arabic, Chinese,
Japanese and Malagasy. I would be particularly interested in knowing
about languages for which writing systems have only recently been
developed, or for which the spelling system has recently undergone a
massive restructuring: conventional wisdom has it that in such cases
the writing system should be very phonemic, but perhaps that is not
always true.

Please send any replies to me, and if there are a sufficient number I
will post the results of this survey to the List.


