Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more

Donate Now | Visit the Fund Drive Homepage

Amount Raised:


Still Needed:


Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington

Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Discussion Details

Title: Re: 17.869, Capital Tresillo/Cuatrillo in Unicode
Submitter: Jim Fidelholtz
Description: My comment is more about Unicode itself than the specific question. I
really know little about the details of this coding system, though its
aims, if I understand them, seem laudable; that is: to have a system which
in principle (and, presumably, with time, in fact) will have codings for
*all* orthographies, scripts, etc., with *any* relevance for *anyone*,
*anywhere*, at any time, past present &/or future. At least this is my
understanding, based on very limited facts, but on rather widespread
information from various sources, including LinguistList, and now the
'official' Unicode page. As I understand the coding, there is room for
2[up-arrow]16 characters, or something over 65000 characters. This seems
like a lot to me, but maybe it isn't (considering that Chinese has perhaps
several tens of thousands of characters all by itself, counting variants
and older ones). I don't understand why they cannot just add one or more
hexadecimal digits to the code, if it seems necessary, and use the lower
codes for the more common languages, perhaps forcing users of some less
common languages to use a (mandatory) switch in their software (or, perhaps
better, their computer) to switch to the higher codes (over decimal 65535).

This rambling preamble is basically supporting an argument for making it
relatively easy and quick to add characters, even on the apparently
flimsiest of arguments, so as not to leave any former, present or future
symbol-using system out of consideration. I would also emphasize that
computers have been very rapid now for close to two decades, and all
indications are that they will continue to get faster. Furthermore,
character manipulations (even multi-byte ones) are among the fastest things
done by computers, and more so when they are organized (compare the
response times to get many millions of answers from Google).

The suggestion, then, would be that, if Michael Everson wants capital
tresillo &/or cuatrillo, I'm willing to wait the extra couple of
nanoseconds that every single operation on characters will thereafter cost
me (temporarily), which in a couple of years will have speeded up by a
couple of orders of magnitude in any case. And if I don't meanwhile get a
new computer, I'm willing to suffer in silence (grousing would take more
time than the total I would lose, anyway).
Date Posted: 03-Apr-2006
Linguistic Field(s): Computational Linguistics
Writing Systems
LL Issue: 17.987
Posted: 03-Apr-2006

Search Again

Back to Discussions Index