Mon Nov 5 2001

Sum: Informal Romanized Orthographies

Date: Sun, 04 Nov 2001 13:04:56 +0400
From: David Palfreyman <>
Subject: Informal Romanized orthographies

Here is my summary of the replies I received about the use of an informal 
Roman orthography system by students (e.g. Arabs) in internet contexts.

The media referred to included informal emails, chatrooms, instant 
messaging, websites and typewritten communication.

People who use romanization include non-linguists, as well as linguists 
(who have their own more or less standardized orthographies, e.g. H, 9 for 
Arabic pharyngeals) and proto-linguists (e.g. linguistics students who are 
asked to render their native language for people not familiar with their 

The functional motivation for such orthographies is the difficulty of 
typing certain characters using ASCII. Sometimes a previously used 
Romanization standard becomes unusable because it is not easily typeable 
(e.g. Berber; Chinese tones when represented as diacritics). However, 
even when fonts are available the phenomenon often lives on, e.g. among 
Arab students whose computers are Arabic enabled but still use Romanization
 for privacy, 'cool value', etc.

The choice of which ASCII symbols to use for particular sounds include:

a) Use of 'spare letters' not used otherwise in the orthography for that 
language (often q, x, w). E.g. <w> for Georgian /tS/. This sometimes 
seems to be motivated by keyboard layout (e.g. <x> for 'soft sign' in 
Russian, which is assigned to the X key on the ASCII keyboard)

b) Visual similarity as well as or instead of sound (e.g. <8> or <0> for 
Greek theta, <H> for Greek eta; <w> for Hebrew /sh/ - in all these cases 
the ASCII character resembles visually (rather than phonologically) a 
character in the language's own script). Cyrillic languages, however, 
seem to use more phonological/traditional Romanization (e.g. <kh>).

c) Initial sounds in familiar words (e.g. <4> and <6> for /ch/ and /sh/ in 
Bulgarian, where the Bulgarian words for these numbers (chetiri 3D 4 and 
shest 3D 6) begin with the phoneme in question.

d) Orthography of other Roman-alphabet languages familiar to the writer 
(e.g. French-type <ou> for /u/ in Moroccan Arabic, <y> and <j> for /j/ in 
Persian speakers living in Anglophone and German-speaking countries 

e) IPA (e.g. <x> for /x/ in Georgian).

Hard-to-type Roman characters (e.g. diacritics in Croat or Turkish) are 
generally just omitted, but sometimes doubled letters or upper case are 
used to signal the distinction).

Digraphs are often used, eg. in Esperanto <cx>, <gx> for <c>, <g> with 
circumflex). Also <'>, e.g. '7 for the dotted 7 character in Arabic, <'b> 
in Hausa/Fulani, or to indicate front vowels in Tatar.
Such orthographies occur even in very widely-used languages which use 
Roman characters, e.g. French.

These orthographies are mostly variable, even within one message from a 
single author.

*Other interesting issues*:

Historical change and old/new orthographies. E.g. Arabic orthography is 
based on Classical Arabic, thus not reflecting variety and changes in 
vernacular Arabic. NB: Tatar has had four alphabets this century, under 
the influence of Islam, the USSR and Westernization.
In Taiwan: phonological uses of Chinese ideograms (using ideograms which 
represent the vernacular sound rather than the meaning of what you want to 
write). Use of Chinese ideograms to write English (again, according to 
the pronunciation of the ideograms). Use of non-Roman phonological 
representation, e.g. Zhuyin Fuhao in Taiwan.

Psycho- and sociolinguistics: what informal orthographies show about 
speakers' perceptions/ processing of their own language. Ease of 
comprehension by native speakers (e.g. Turks have little problem understand
ing Turkish written without diacritics), and by non-native speakers (e.g. 
a more transparent new informal orthography for Amharic may be easier than 
one which uses English sound-letter correspondences but loses certain 
phonemic distinctions). 20

Speaker language choice? E.g. Chinese students in Beijing may choose to 
email each other in English rather than trying to represent Chinese in 

Psycholinguistic representations vs linguists' representations (e.g. IPA); 
and representations current in certain social circles vs. official (e.g. 
state-promoted) representations. Representation of boundaries (e.g. spaces 
put between root and bound morpheme in Romanized Japanese).

Disputes about how to represent sounds. Speakers' representations of the 
phenomenon of Romanization (e.g. Russian "pisat' po-pol'ski", i.e. "write 
in Polish", since Polish uses a Roman alphabet).

Numbers and letters as phonological ideograms (e.g. <OK m8> for English 
"OK, mate"; Taiwan Chinese <AV8D> for English "everybody" - the Chinese 
for 8 is "ba", so "A-V-ba-D" is pronounced like "everybody"). This kind 
of usage seems to be fairly standardized: speed but also cool value?

*Technical issues*: characters may be entered OK by the writer of an 
email, but arrive transformed or even deleted. Unicode as a solution to 

*Web and other resources*:
International Journal of the Sociology of Language, no 150, on digraphia.
