|Title:||Research and Implementation of A Domain-Unconstrained Chinese Automatic Abstracting System||Add Dissertation|
|Author:||Junjie Li||Update Dissertation|
|Email:||click here to access email|
|Institution:||Harbin Institute of Technology, Computer Science and Engineering|
|Linguistic Subfield(s):||Computational Linguistics;|
|Abstract:||In this dissertation, the author
(1) presented a new text representation, called Hierarchical Network, which is based on the natural hierarchical structure of text, i.e. Chapter, paragraph, sentence, sub-sentence, word and character to automatically index the text and corpus.
(2) designed and implemented a Non-dictionary Automatic Chinese Segmentation System which provides a new efficient method to segment new words and reach a 98% correctness rate of segmentation for open test.
(3) developed a new characteristic word weighting function which is based on word frequency and word length, where corpus-based co-occurrence computation is used in calculating word frequency.
(4) developed a new sentence importance weighting function which synthetically reflects the factors of sentence length, the number of sub-sentences, the sum of characteristic words and so on, instead of utilizing static information such as sentence location, connective expression words and phrases etc..
(5) designed and implemented a Unconstrained Chinese Automatic Abstracting System, which can generate abstracts of the texts in deferent field, style and length by any abstracting rate.