* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *

LINGUIST List 24.1140

Wed Mar 06 2013

Diss: Comp Ling/Lexicography/Semantics/Text/Corpus Ling: Seppälä: 'Contraintes sur la sélection des informations dans les définitions terminographiques...'

Editor for this issue: Lili Xia <lxialinguistlist.org>

Date: 05-Mar-2013
From: Selja Seppälä <selja.seppala.unigegmail.com>
Subject: Contraintes sur la sélection des informations dans les définitions terminographiques : vers des modèles relationnels génériques pertinents
E-mail this message to a friend

Institution: University of Geneva
Program: Département de Traitement Informatique Multilingue
Dissertation Status: Completed
Degree Date: 2012

Author: Selja Seppälä

Dissertation Title: Contraintes sur la sélection des informations dans les définitions terminographiques : vers des modèles relationnels génériques pertinents

Dissertation URL: http://archive-ouverte.unige.ch/vital/access/manager/Repository/unige:21874

Linguistic Field(s): Computational Linguistics
                            Text/Corpus Linguistics

Dissertation Director:
Bruno de Bessé

Dissertation Abstract:

Definitions are included in terminological resources to ensure that they
fulfill the function of conveying information about the meaning and
usage of terms in the domain; they facilitate and enhance
communication. The activity of definition writing is still mostly realized
manually. Terminologists would however greatly benefit from the
assistance of (semi-)automatic definition writing tools. Such tools would
not only accelerate the process of writing definitions but also enhance
the consistency and thus the overall quality of the definitions produced.

The general objective of my work is thus to conceive and implement
generic tools to assist in definition writing, whatever the terminographic
context, the domain or the language. In my thesis, I explore more
specifically the nature of dictionary definitions and of the activity of
definition writing in terminology. The main research topic of my thesis
relates to the selection of defining information.

Typically, terminologists construct definitions using information in texts
written by domain experts. However, not all the pieces of information
found in these texts can be considered as defining and, when they are,
not all of them are considered relevant to be included in a definition.
One of the most challenging tasks of definition writing is therefore the
selection of defining information. Thus, the two main questions raised
by definition writing and which ought to be addressed in order to
conceive and implement generic definition writing tools are the

-What determines or influences information selection?
-What types of information are relevant to defining?

Considering the different factors that are acknowledged to constrain
the selection of defining information, the one constraint that is, prima
facie, the most independent from any domain and language is the level
of reality. I therefore make the hypothesis that information selection is
partly a function of the type of entity defined. If this hypothesis is
verified, it is possible to propose defining models based on the
properties and relations characterizing each type of entity.

To test this hypothesis, I propose to adopt the categories of an existing
realist upper-level ontology, the Basic Formal Ontology (BFO), and
their specifications. This ontology is aimed at representing the type of
things that exist in the world, their properties and their relations to
other types of entities. In BFO, entity types are organized according to
philosophical distinctions and they are consistent with the scientific
knowledge of the world. I propose to adapt these categories to creating
relational models, and to use these models to describe the internal
structure of existing definitions. The idea is that large-scale multi-
domain and multilingual corpus analyses can be used to test the
hypothesis and, if verified, to implement these models in a (semi-
)automatic definition writing tool.

A pilot experiment based on a corpus analysis of a sample of 240
terminological definitions extracted from 15 domains yielded
encouraging results, with almost 75 % of the relations expressed in the
analyzed definitions pertaining to the models associated with each
entity type. This empirical study shows, moreover, which relations in
these generic models are most relevant in terminological definitions.
These results tend to confirm the tested hypothesis. The theoretical
considerations underlying this methodological proposition also
contribute to the foundations of an integrated theory of definitions in

Read more issues|LINGUIST home page|Top of issue

Page Updated: 06-Mar-2013

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.