Fri Nov 24 2000

Sum: Korean Hangul Frequency

Date: Mon, 20 Nov 2000 09:38:36 -0700
From: Tim Mills <>
Korean Hangul frequency


Some weeks ago, I posted a request for information on frequency of Hangul
characters in Korean text. This is a summary of the responses I received.
	In general, the consensus was that such research is rare or
nonexistent, as it has less value in academics than in the commercial
sector, where I work. However, here are some responses that helped me:

>From Byong-seon Yang,

There is a published book on Hangul Frequency "Hangul Sayong Bindo-uy
Bunsuk" (An Analysis of Korean Frequncy: �ѱۻ��ݵ��� �м�) which is
written in Korean and published in Korea Cultural Reserch Center, Korea
University press, Seoul. The book analyzed by consonant and vowel
(onset, coda), syllable, etc. Unfortuantely it is written in Korean. If
you read Korean, it is useful for you since it is a kind of table
analyed by the number of frequency. The publisher's phone # is
82-2-3290-1610~8, fax: 82-2-926-8385).
If you need more help, please contact me.

Byong-seon Yang, Ph.D.
Professor of English, Chair of Korean Studies
Jeonju University
Chonju, Korea 560-759
Tel) 82-63-220-2213 (Office)
 82-63-226-3294 (H)
Fax) 82-63-224-9920

>From Sean M. Witty,

Off the top of my head, I don't think any such documentation exists. If it
does, the numbers must be staggering. 
Korean phonology is not as dynamic as that of English. Thus, there are fewer
possible syllables, overall, available to the language (I have compiled a
catalog). The total is further reduced because, although some syllables are
possible according to the phonology, they simply aren't used by the
language. Of those that are phonologically possible and used meaningfully,
the pronunciation may vary depending on the phonetic environment (reducing
the total possible number of syllables even further). The end result is a
5000+ year old language that uses a vocabulary based on a relatively small
number of syllables. 
This leads to each syllable having more than one meaning, sometimes as many
as ten (thereby increasing the frequency of each). Take a common syllable
like ? (ka), which has several meanings and is a case marker. The frequency
of usage for this one syllable, either in terms of meaningfulness or daily
usage, would be an extremely high number. This would also probably be true
of almost every other syllable in the language. 

>From Hyeri Joo,

If you're interested in frequencies of Korean words or morphemes, go
to the Web site <>. The site is still developing, but it
will be very helpful for you since you're a computational linguist.

And the most informative response was from Ivan A. Derzhanski, who sent me
data from his own research on the subject:

My corpus consisted of 1 024 424 syllables' worth
of newspaper text, mostly from the Daily Hankyoreh. There were
1526 different syllables found in the text, of the 2350 the KSC
code caters for.

Derzhanski's data includes counts for how many times each Hangul appeared in
his corpus, as well as counts on onset, nucleus, and coda jamo. I include
his signature information here in case anyone wishes to contact him about
the data:

<fa-al-_haylu wa-al-laylu wa-al-baydA'u ta`rifunI
 wa-as-sayfu wa-al-qir.tAsu wa-al-qalamu>
 (Abu t-Tayyib Ahmad Ibn Hussayn al-Mutanabbi)
Ivan A Derzhanski
H: cplx Iztok bl 91, 1113 Sofia, Bulgaria <>
W: Dept for Math Lx, Inst for Maths & CompSci, Bulg Acad of Sciences

Thanks to everyone who responed to my posting, and thanks especially to Ivan
Derzhanski for sharing his data.

	- Tim Mills -
	Zi Corporation

- --------------------------------------------

Tim Mills, Computational Linguist
Zi Corporation
Suite 300, 500 - 4 Avenue SW
Calgary, Alberta
Canada T2P 2V6

Main: (403) 233.8875
Direct: (403) 231.4591
Fax: (403) 231.4595
