Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


It's Been Said Before

By Orin Hargraves

It's Been Said Before "examines why certain phrases become clichés and why they should be avoided -- or why they still have life left in them."

New from Cambridge University Press!


Sounds Fascinating

By J. C. Wells

How do you pronounce biopic, synod, and Breughel? - and why? Do our cake and archaic sound the same? Where does the stress go in stalagmite? What's odd about the word epergne? As a finale, the author writes a letter to his 16-year-old self.

Academic Paper

Title: A new PPM variant for Chinese text compression
Author: Peiliang Wu
Institution: University of Wales, Bangor
Author: W. J. Teahan
Institution: University of Wales, Bangor
Linguistic Field: Computational Linguistics; Writing Systems
Subject Language: Chinese, Mandarin
Abstract: Large alphabet languages such as Chinese are very different from English, and therefore present different problems for text compression. In this article, we first examine the characteristics of Chinese, then we introduce a new variant of the Prediction by Partial Match (PPM) model especially for Chinese characters. Unlike the traditional PPM coding schemes, which encodes an escape probability if a novel character occurs in the context, the new coding scheme directly encodes the order first before encoding a symbol, without having to output an escape probability. This scheme achieves excellent compression rates in comparison with other schemes on a variety of Chinese text files.


This article appears IN Natural Language Engineering Vol. 14, Issue 3.

Return to TOC.

Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page