Editor for this issue: <>
Announcing the availability of three Unicode Technical Reports Three Unicode Technical Reports are now available from the Unicode Consortium. They may be obtained for a nominal fee that covers postage and printing costs. Please note that Unicode Corporate, Associate, and Individual members will automatically receive copies and need not order them specifically. These reports are being disseminated for public review and comment. In order to achieve the widest possible distribution, they may be freely copied and distributed for review purposes (provided the notices, etc. remain intact). In each case, the review period ends on August 15, 1993. Contact the consortium for costs. Technical Report #1: Draft Proposals Contains Burmese, Khmer, and Ethiopian proposals which constitute the strong technical recommendations of the Unicode Technical Committee for these scripts. (To allow further review, they were not included in Unicode 1.0.) Technical Report #2: Preliminary Draft Proposals Contains Mongolian, Sinhala, and Tibetan proposals which constitute recommended approaches to these scripts. (To allow further review, Mongolian and Sinhala were not included in Unicode 1.0. Tibetan was retracted for further study in the process of merging with ISO 10646.) Technical Report #3: Exploratory Proposals Contains proposals for the following scripts: Aramaic, Balti, Batak, Buginese, Cherokee, Etruscan, Glagolitic, Kirat(Limbu), Lepcha(Rong), Linear-B, Maldivian, Manipuri, Meroitic, Numidian, Ogham, Old Persian Cuneiform, Pahlavi/Avestan, Phoenician, Runes, South Arabian, Syriac, Tagalog/Mangyan, Tai Lu, Tai Mau, Ugaritic Cuneiform. These proposals represent possible encoding models for the scripts and are being presented in an exploratory fashion for their initial public comment and review. They will be issued subsequently as Draft Proposals. To Order, please inquire to: Unicode, Inc. 1965 Charleston Ave. Mountain View, CA 94043 Phone: (415) 961-4189 FAX: (415) 966-1637 Internet: infoMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueunicode.org The ASCII plain text of these technical reports, exclusive of charts, is also available via anonymous FTP from the site "Unicode.ORG". Files are: pub/TechReports/UTR_1.ascii pub/TechReports/UTR_2.ascii pub/TechReports/UTR_3.ascii
Corpus-Based Frequency Count of Modern Chinese Corpus-based study of Chinese is one of the research projects of the Chinese Knowledge Information Processing Group (CKIP) at Academia Sinica. The current research is based on a Chinese newspaper corpus, which amounts to 20,698,116 characters ( 9,540,444 words after word segmentation.) Four technical reports in Chinese are published. These include: 1. Corpus-Based Frequency Count of Characters in Journal Chinese 30 pages (US$ 5) 2. Corpus-Based Frequency Count of Words in Journal Chinese 300 pages (US$ 20) 3. The Most Frequent Verbs in Journal Chinese and Their Classification 140 pages (US$ 10) 4. The Most Frequent Nouns in Journal Chinese and Their Classification 150 pages (US$ 10) The first report lists 5,666 distinct characters which appear in the entire corpus. The second report contains 42,686 words that occur more than three times in the corpus. The most common 14,956 words constitute more than 99.9995 percent of all the words occurring in the corpus. The third and the fourth report include 19,907 verbs and 21,368 nouns respectively which occur more than twice in the corpus with their syntactic or semantic classification. To order, please list the desired title(s) and enclose a cheque of the appropriate amount payable to the Computational Linguistic Society of the R.O.C. (ROCLING). The prices listed above include postage and handling. Address : Miss Tsai Shu-hui ROCLING Institute of Information Science Academia Sinica, Nankang Taipei, Taiwan 11529 R.O.C. Tel. : 886-2-788-1638 Fax : 886-2-788-1638 E-Mail : rocltshMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueiis.sinica.edu.tw