LINGUIST List 2.283

Monday, 10 June 1991

Disc: Diacritics

Editor for this issue: <>

Directory

Lars Henrik Mathiesen, ISO10646 and diacritical marks

Mark Johnson, Re: ISO10646 and diacritical marks

Message 1: Responses: Diacritics and Acronyms

Date: Thu, 6 Jun 91 18:51:07 EDT
From: <macrakisosf.org>
Subject: Responses: Diacritics and Acronyms
Let me reassure Peeters and others. No one is trying to get rid of diacritics in general. The argument is much narrower than that: should character encodings be closed (i.e. contain a fixed repertoire of character+diacritic combinations) or open (i.e. permit arbitrary combinations of character and diacritic)?

The draft international standard DIS10646 has a closed repertoire, Unicode an open one (but also contains many pre-composed characters).

Both proposals cover the main languages of the world fully (yes, `main' is vague but certainly includes a wide range: certainly all the European languages, but also Vietnamese, Orissa, Persian, ...). Both can encode e-acute as a single code. Both can encode Ancient Greek eta-rough-grave- subscript, but 10646 encodes it as a single code (yes, there is a list of all the possible combinations in the standard!), while Unicode encodes it as a sequence of four codes. The difference comes with characters which are NOT widely used, either because the language is not among those covered by ISO (Intl. Standards Organization), or because the usage is narrow (e.g. scholarly). For instance, Cyrillic_R-macron-grave (useful for Slavic philology, perhaps?) does not exist as a precomposed character in Unicode or in 10646. But in Unicode it can be represented as a combination of three codes, even if it's never been used before (and even if it is a typo!). 10646 could of course add it in a future revision, but this has to be done on a case-by-case basis.

In closed repertoire systems, adding a new character is expensive. In open repertoire systems, using characters composed of existing elements costs nothing. Proponents of closed repertoire systems argue that inventors of NEW orthographies should limit themselves to standard characters. Proponents of open repertoire systems argue that this is an unnatural limitation which restricts designers of orthographies artificially.

<<That>> is what the argument is about, <<not>> about suppressing e-acute.

-s

Message 2: ISO10646 and diacritical marks

Date: Fri, 7 Jun 91 15:17:14 +0200
From: Lars Henrik Mathiesen <thorinnodin.diku.dk>
Subject: ISO10646 and diacritical marks
In response to Bert Peeters:

The draft of ISO 10646 does contain accented characters --- lots of them. As I understand it, the purpose is that not only English and French, but most or all languages with a standard orthography, should find all the characters they need in there.

The argument is about how letter/diacritic combinations should be represented. Currently, all the combinations that are thought necessary are enumerated in a (very large) set of lists. There is no way, within the draft standard, to create new combinations. (A given display system may have ways of superimposing two symbols from the standard, but the exact method will vary between systems.)

The suggestion is that the draft standard should be changed to include "floating diacritics," which are defined _by_the_standard_ to be placed with the next letter. The message that was copied here (by a proponent of this) was from the main opponent of the suggestion, it seems.

By his argument, the only way new combinations could become necessary would be for ``irresponsible'' linguists to invent ``unreasonable'' accented letters when creating an orthography for a language. Therefore, it is reasonable for ISO to create a standard that cannot accomodate such alphabets, and therefore it's the linguists (and not the standard committee) who will be guilty of forcing the users of that language to use non-standard equipment, with the attendant costs.

(To me, this argument seems a little circular. However, there are technical reasons why a standard without floating diacritics is easier to implement.)

I think the message was copied here to elicit arguments in favor of floating diacritics, i.e., good reasons why a fixed repertoire cannot be sufficient.

-- Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcsun!diku!thorinn Institute of Datalogy -- we're scientists, not engineers. thorinndiku.dk

Message 3: Re: ISO10646 and diacritical marks

Date: Mon, 10 Jun 91 10:15:04 +0200
From: Mark Johnson <markadler.philosophie.uni-stuttgart.de>
Subject: Re: ISO10646 and diacritical marks
Speaking from just about complete ignorance except for what I've seen on this group, I take it that ISO10646 proposes a fixed-length character encoding system. Some of these characters would have diacritics "built in" (the way the Postscript and Mac character sets have accented vowels as single characters, for example), I would assume. (Please, someone correct me if I am wrong here!).

I think what's at issue here is whether there should be a productive way of combining pre-existing characters to form new characters (think of overstriking as such a way). For example, one might agree to interpret the backslash as an 'escape character' that means 'build a new character by overstriking the next two characters', as in \a` , for example. Since the new characters so constructed could themselves be subjected to further overstriking, we can get a very large number of characters.

As I understand it, the technical problems associated with such a system are largely typographical, having to do with obtaining an acceptable typeface for the new combined character. It's difficult to imagine how one could automatically figure out exactly where some diacritic should be placed over some other character that is already a compound of many basic characters. Recall that in general we will be dealing with variable-width fonts with multiple faces or type-styles; I think the currently accepted position (again, someone please correct me if I am wrong!) is that the only way to get acceptable results is to have each diacritic-character combination individually designed by a type-face designer.

However, I think that the sensible thing for the ISO standard to do is to build in a general escape method, which (among other things) would allow overstriking of arbitrary characters to build new characters. At the same time, the standards writers should not that with current technology, the results are likely to be typographically unacceptable --- but who knows what technology in 10 or 20 years from now will be like? No point in shutting the door unless you have to!

Mark Johnson

[End Linguist List, Vol. 2, No. 0283]