Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!

ad

Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."



E-mail this page

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at https://linguistlist.org/!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at webdevlinguistlist.org***

Dissertation Information


Title: Research and Implementation of A Domain-Unconstrained Chinese Automatic Abstracting System Add Dissertation
Author: Junjie Li Update Dissertation
Email: click here to access email
Homepage: http://world.kaist.ac.kr/~jklee/
Institution: Harbin Institute of Technology, Computer Science and Engineering
Completed in: 1995
Linguistic Subfield(s): Computational Linguistics;
Subject Language(s): Chinese, Mandarin
Director(s): Wang Kaizhu

Abstract: In this dissertation, the author

(1) presented a new text representation, called Hierarchical Network, which is based on the natural hierarchical structure of text, i.e. Chapter, paragraph, sentence, sub-sentence, word and character to automatically index the text and corpus.

(2) designed and implemented a Non-dictionary Automatic Chinese Segmentation System which provides a new efficient method to segment new words and reach a 98% correctness rate of segmentation for open test.

(3) developed a new characteristic word weighting function which is based on word frequency and word length, where corpus-based co-occurrence computation is used in calculating word frequency.

(4) developed a new sentence importance weighting function which synthetically reflects the factors of sentence length, the number of sub-sentences, the sum of characteristic words and so on, instead of utilizing static information such as sentence location, connective expression words and phrases etc..

(5) designed and implemented a Unconstrained Chinese Automatic Abstracting System, which can generate abstracts of the texts in deferent field, style and length by any abstracting rate.