LINGUIST List 24.1140

Wed Mar 06 2013

Diss: Comp Ling/Lexicography/Semantics/Text/Corpus Ling: Seppälä: 'Contraintes sur la sélection des informations dans les définitions terminographiques...'

Editor for this issue: Lili Xia <lxialinguistlist.org>



Date: 05-Mar-2013
From: Selja Seppälä <selja.seppala.unigegmail.com>
Subject: Contraintes sur la sélection des informations dans les définitions terminographiques : vers des modèles relationnels génériques pertinents
E-mail this message to a friend

Institution: University of Geneva Program: Département de Traitement Informatique Multilingue Dissertation Status: Completed Degree Date: 2012

Author: Selja Seppälä

Dissertation Title: Contraintes sur la sélection des informations dans les définitions terminographiques : vers des modèles relationnels génériques pertinents

Dissertation URL: http://archive-ouverte.unige.ch/vital/access/manager/Repository/unige:21874

Linguistic Field(s): Computational Linguistics                             Lexicography                             Semantics                             Text/Corpus Linguistics
Dissertation Director:
Bruno de Bessé
Dissertation Abstract:

Definitions are included in terminological resources to ensure that theyfulfill the function of conveying information about the meaning andusage of terms in the domain; they facilitate and enhancecommunication. The activity of definition writing is still mostly realizedmanually. Terminologists would however greatly benefit from theassistance of (semi-)automatic definition writing tools. Such tools wouldnot only accelerate the process of writing definitions but also enhancethe consistency and thus the overall quality of the definitions produced.

The general objective of my work is thus to conceive and implementgeneric tools to assist in definition writing, whatever the terminographiccontext, the domain or the language. In my thesis, I explore morespecifically the nature of dictionary definitions and of the activity ofdefinition writing in terminology. The main research topic of my thesisrelates to the selection of defining information.

Typically, terminologists construct definitions using information in textswritten by domain experts. However, not all the pieces of informationfound in these texts can be considered as defining and, when they are,not all of them are considered relevant to be included in a definition.One of the most challenging tasks of definition writing is therefore theselection of defining information. Thus, the two main questions raisedby definition writing and which ought to be addressed in order toconceive and implement generic definition writing tools are thefollowing:

-What determines or influences information selection?
-What types of information are relevant to defining?

Considering the different factors that are acknowledged to constrainthe selection of defining information, the one constraint that is, primafacie, the most independent from any domain and language is the levelof reality. I therefore make the hypothesis that information selection ispartly a function of the type of entity defined. If this hypothesis isverified, it is possible to propose defining models based on theproperties and relations characterizing each type of entity.

To test this hypothesis, I propose to adopt the categories of an existingrealist upper-level ontology, the Basic Formal Ontology (BFO), andtheir specifications. This ontology is aimed at representing the type ofthings that exist in the world, their properties and their relations toother types of entities. In BFO, entity types are organized according tophilosophical distinctions and they are consistent with the scientificknowledge of the world. I propose to adapt these categories to creatingrelational models, and to use these models to describe the internalstructure of existing definitions. The idea is that large-scale multi-
domain and multilingual corpus analyses can be used to test thehypothesis and, if verified, to implement these models in a (semi-
)automatic definition writing tool.

A pilot experiment based on a corpus analysis of a sample of 240terminological definitions extracted from 15 domains yieldedencouraging results, with almost 75 % of the relations expressed in theanalyzed definitions pertaining to the models associated with eachentity type. This empirical study shows, moreover, which relations inthese generic models are most relevant in terminological definitions.These results tend to confirm the tested hypothesis. The theoreticalconsiderations underlying this methodological proposition alsocontribute to the foundations of an integrated theory of definitions interminology.



Page Updated: 06-Mar-2013