Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!

ad

Voice Quality

By John H. Esling, Scott R. Moisik, Allison Benner, Lise Crevier-Buchman

Voice Quality "The first description of voice quality production in forty years, this book provides a new framework for its study: The Laryngeal Articulator Model. Informed by instrumental examinations of the laryngeal articulatory mechanism, it revises our understanding of articulatory postures to explain the actions, vibrations and resonances generated in the epilarynx and pharynx."


New from Oxford University Press!

ad

Let's Talk

By David Crystal

Let's Talk "Explores the factors that motivate so many different kinds of talk and reveals the rules we use unconsciously, even in the most routine exchanges of everyday conversation."



E-mail this page 1

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at https://linguistlist.org/!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at webdevlinguistlist.org***

Dissertation Information


Title: Computer-Assisted Lemmatisation of a Cornish Text Corpus for Lexicographical Purposes Add Dissertation
Author: Jon Mills Update Dissertation
Email: click here to access email
Homepage: http://kent.academia.edu/JonMills
Institution: University of Exeter, Department of Language and Linguistics
Completed in: 2002
Linguistic Subfield(s): Lexicography;
Subject Language(s): Cornish
Cornish, Old
Cornish, Middle
Director(s): Reinhard Hartmann

Abstract: This project sets out to discover and develop techniques for the lemmatisation of a historical corpus of the Cornish language in order that a lemmatised dictionary macrostructure can be generated from the corpus. The system should be capable of uniquely identifying every lexical item that is attested in the corpus. A survey of publish ed and unpublished Cornish dictionaries, glossaries and lexicographical notes was carried out. A corpus was compiled incorporating specially prepared new critical editions. An investigation int the history of Cornish lemmatisation was undertaken. A system ic description of Cornish inflection was written. Three methods of corpus lemmatisation were trialed. Findings were as follows. Lexicographical history shapes current Cornish lexicographical practice. Lexicon based tokenisation has advantages over character based tokenisati . System networks provide the means to generate base forms from attested word types. Grammatical difference is the most reliable way of disambiguating homographs. A lemma that contains three fields, the canonical form, the part -of-speec and a semantic field label, provides of a unique code for every lexeme attested in the corpus. Programs which involve human interaction during the lemmatisation process allow bootstrapping of the lemmatisation database. Computerised morphological processing may be used at least to partially create the lemmatisation database. Disambiguation of at least some of the most common homographs may be automated by the use of computer programs.