Editor for this issue: Ann Dizdar <ann
linguistlist.org>
Ted Harding has raised a number of interesting questions regarding email and accented characters. We, H. Allan Gleason, Jr. (Professor Emeritus, Univ of Toronto), Henry Gleason, and David F. Stermole, as linguists and programmers, have struggled with character encoding for almost twenty years. We would like to contribute something to both the discussion of and the solution to the problem. Writing email in English has been possible from the very beginning, because the vast majority of programmers spoke English and they were the ones who used it. 7-bit ASCII, derived from the typewriter keyboard with the addition of symbols programmers needed, was used as the character set. This imposed severe limitations on even the proper rendering of English. As computing spread, different standards for encoding other languages developed. By replacing some of the programmers' symbols, US ASCII was transformed into German ASCII, French ASCII, etc. While German was handled almost satisfactorily, French was missing accented capital vowels and some accented vowels altogether, and neither had proper European quotation marks (chevrons). IBM introduced their 8-bit ASCII in the 1980s, but it was a conglomeration of letters from Western Europe, some Greek letters and logic symbols for mathematics, and a full set of graphics pieces to create forms on the screen; although the European quotation marks were introduced, even the doubling of the number of codes from 128 to 256 was not sufficient to handle French properly. Compromises had been made again. When Russian came into the mix, one solution was to use the old 7-bit ASCII for English so that programming could be done and the eighth bit signalled that characters were Russian. This meant that Russian could be mixed with English but not with other languages. Other efforts have included creating a set of ISO 8859 fonts (see http://wwwwbs.cs.tu-berlin.de/~czyborra/charsets for information) to handle various different language combinations. However, mixing English with Greek, Russian, Polish, Slovak, and Serbian in a single document using these fonts remains cumbersome or impossible, primarily because word processors typically use just a single 8-bit byte to represent each character. (A side issue is the absence of the Ukrainian G character from the ISO-8859-5 set.) This means that only one of the fonts is normally used for a whole document. To encode characters from multilple languages, there are two practical possibilities: use non-printable 8-bit codes to indicate a switch from one character set to another or use more bits to encode the characters. The former is an option that has not been actualized in any widespread manner. To handle the many ISO 8859 fonts would require four more bits to distinguish one from another. Since computers are currently designed to efficiently use bits in multiples of eight, this would result in four bits being unused/wasted. Also, no allowance was made for using accents other than the ones that were included with the unitary characters. Sixteen bits allows for 65,536 unique codes. Proponents of Unicode (see its home page: http://www.stonehand.com/unicode.html) declare that this number will suffice to handle all of the characters of all the languages in the world. And this encoding includes a vast array of extra floating accents. The advent of Unicode encoding now makes multilingual email possible with the proper software. However, email even now is still pretty much a 7-bit affair, because many email gateways on the Internet handle only seven bits at a time. This requires a conversion of 8-bit text to 7-bit for transmission. This has been the job of uuencode/uudecode. This is the rationale for our using Unicode and providing access to a wide variety of accents. That leaves but one problem -- how to access this vast array of characters easily. We decided on individualized keyboard maps for each of the up to seven languages/alphabets that the user wishes to use. Switching from one keyboard to another is a simple matter of tapping a function key. To see how our proposed solution works, visit our Internet site at http://www.panglot.com. - David F. Stermole e-mail: stermoleMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuechass.utoronto.ca