Monday, 10 June 1991

Date: Thu, 6 Jun 91 18:51:07 EDT
From: <>
Let me reassure Peeters and others. No one is trying to get rid of
diacritics in general. The argument is much narrower than that: should
character encodings be closed (i.e. contain a fixed repertoire of
character+diacritic combinations) or open (i.e. permit arbitrary
combinations of character and diacritic)?

The draft international standard DIS10646 has a closed repertoire, Unicode
an open one (but also contains many pre-composed characters).

Both proposals cover the main languages of the world fully (yes, `main' is
vague but certainly includes a wide range: certainly all the European
languages, but also Vietnamese, Orissa, Persian, ...). Both can encode
e-acute as a single code. Both can encode Ancient Greek eta-rough-grave-
subscript, but 10646 encodes it as a single code (yes, there is a list of
all the possible combinations in the standard!), while Unicode encodes it as
a sequence of four codes. The difference comes with characters which are
NOT widely used, either because the language is not among those covered by
ISO (Intl. Standards Organization), or because the usage is narrow (e.g.
scholarly). For instance, Cyrillic_R-macron-grave (useful for Slavic
philology, perhaps?) does not exist as a precomposed character in Unicode
or in 10646. But in Unicode it can be represented as a combination of three
codes, even if it's never been used before (and even if it is a typo!).
10646 could of course add it in a future revision, but this has to be done
on a case-by-case basis.

In closed repertoire systems, adding a new character is expensive. In open
repertoire systems, using characters composed of existing elements costs
nothing. Proponents of closed repertoire systems argue that inventors of
NEW orthographies should limit themselves to standard characters.
Proponents of open repertoire systems argue that this is an unnatural
limitation which restricts designers of orthographies artificially.

<<That>> is what the argument is about, <<not>> about suppressing e-acute.

Date: Fri, 7 Jun 91 15:17:14 +0200
From: Lars Henrik Mathiesen <>
In response to Bert Peeters:

The draft of ISO 10646 does contain accented characters --- lots of
them. As I understand it, the purpose is that not only English and
French, but most or all languages with a standard orthography, should
find all the characters they need in there.

The argument is about how letter/diacritic combinations should be
represented. Currently, all the combinations that are thought
necessary are enumerated in a (very large) set of lists. There is no
way, within the draft standard, to create new combinations. (A given
display system may have ways of superimposing two symbols from the
standard, but the exact method will vary between systems.)

The suggestion is that the draft standard should be changed to include
"floating diacritics," which are defined _by_the_standard_ to be
placed with the next letter. The message that was copied here (by a
proponent of this) was from the main opponent of the suggestion, it

By his argument, the only way new combinations could become necessary
would be for ``irresponsible'' linguists to invent ``unreasonable''
accented letters when creating an orthography for a language.
Therefore, it is reasonable for ISO to create a standard that cannot
accomodate such alphabets, and therefore it's the linguists (and not
the standard committee) who will be guilty of forcing the users of
that language to use non-standard equipment, with the attendant costs.

(To me, this argument seems a little circular. However, there are
technical reasons why a standard without floating diacritics is easier
to implement.)

I think the message was copied here to elicit arguments in favor of
floating diacritics, i.e., good reasons why a fixed repertoire cannot
be sufficient.

Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.
Date: Mon, 10 Jun 91 10:15:04 +0200
From: Mark Johnson <>
Speaking from just about complete ignorance except for what I've
seen on this group, I take it that ISO10646 proposes a fixed-length
character encoding system. Some of these characters would have
diacritics "built in" (the way the Postscript and Mac character
sets have accented vowels as single characters, for example), I
would assume. (Please, someone correct me if I am wrong here!).

I think what's at issue here is whether there should be a productive
way of combining pre-existing characters to form new characters (think
of overstriking as such a way). For example, one might agree to interpret
the backslash as an 'escape character' that means 'build a new character
by overstriking the next two characters', as in \a` , for example.
Since the new characters so constructed could themselves be subjected
to further overstriking, we can get a very large number of characters.

As I understand it, the technical problems associated with such a
system are largely typographical, having to do with obtaining an
acceptable typeface for the new combined character. It's difficult
to imagine how one could automatically figure out exactly where
some diacritic should be placed over some other character that is
already a compound of many basic characters. Recall that in general
we will be dealing with variable-width fonts with multiple faces
or type-styles; I think the currently accepted position (again,
someone please correct me if I am wrong!) is that the only way
to get acceptable results is to have each diacritic-character
combination individually designed by a type-face designer.

However, I think that the sensible thing for the ISO standard to do is
to build in a general escape method, which (among other things)
would allow overstriking of arbitrary characters to build new
characters. At the same time, the standards writers should not
that with current technology, the results are likely to be
typographically unacceptable --- but who knows what technology
in 10 or 20 years from now will be like? No point in shutting
the door unless you have to!

Mark Johnson

