|Title:||Czech Syntactic Lexicon||Add Dissertation|
|Author:||Hana Skoumalová||Update Dissertation|
|Email:||click here to access email|
|Institution:||Charles University in Prague, Institute of Theoretical and Computational Linguistics|
|Linguistic Subfield(s):||Computational Linguistics; Syntax; Lexicography;|
|Abstract:||In this work, an electronic lexicon of Czech verbs is presented. The lexicon contains valency frames of ca 15,000 Czech verbs, and its purpose is to enrich information contained in other electronic dictionaries. The trend of recent years is to make large-scale reusable sources which can be combined with other sources. This work shows how the lexicon cooperates with an existing morphological lexicon and how it can be used in various NLP systems.
Chapter 2 discusses several theoretical approaches in comparison with Functional Generative Description (FGD), which is used for the dictionary. The explication concentrates especially on the structure of lexicons in single theories. A lexicon usually conforms certain preconditions resulting from using a given theoretical framework, and so the possibility of creating a lexicon which would be transferable to another theoretical framework is explored.
Chapter 3 discusses the possibility of using existing sources, with respect to the desired result and the theoretical framework adopted for the work. There were already several Czech syntactic lexicons created in the past, but unfortunately their reuse would be rather difficult. This chapter mentions several such attempts, and describes in detail a lexicon which is used.
Chapter 4 describes the verb frame. First, the format of the lexical entry is described, then various types of reflexive constructions in Czech, and their encoding in the lexicon are discussed. In the next section, possible diatheses of the basic (active) frame are shown, and it is also discussed which of these diatheses can be added to the dictionary on a regular basis and which have to be treated as exceptions. The last section describes so called equi and raising verbs.
In Chapter 5, the procedure of automatic conversion of the source dictionary to the proposed format is shown. For this conversion, an algorithm was created which assigns the functors (semantic roles) to single members of a frame. The output of this procedure will serve as an input for an editor. It is discussed what amount of the source data can be completed by this procedure and what amount needs post-editing. It is also shown how the resulting lexicon can be used in NLP systems.
Chapter 6 sums up. In Section 6.1, verbs are sorted into groups according their frames, and the results are compared with results of other researchers. In Section 6.2, perspectives of the language processing based on symbolic methods are discussed, and the possible usage of the lexicon in corpus linguistics.