Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Academic Paper

Title: A comparison of different approaches to multi-word term acquisition
Paper URL:
Author: Friederike Yvonne Helene Mallchok
Email: click here TO access email
Linguistic Field: Computational Linguistics
Subject Language: English
Abstract: There are many different names in literature for what is called a multi-word term in this paper: "term", "n-gram", "multi-word lexical unit" or just "lexical unit", etc./L//L/Although this number is almost as high as the number of linguists researching this subject, it is widely agreed among them that the acquisition of MWTs is increasingly important in natural language processing (NLP). MWTs express concepts or themes without ambiguities and knowledge about them is essential for many tasks in NLP. Despite this obvious importance of collocational knowledge, it is not usually available in manually compiled dictionaries. Rapid changes in many specialized knowledge domains means that new terms are emerging and undergoing changes all the time increasing the importance of developing better automation tools for their retrieval. The majority of technical expressions do consist of more than one word; among these, the overwhelming majority are noun phrases, which constitute the vast majority of multi-word terminological units in probably all domains./L//L/Because of their non-compositional meaning, terms need to be recognized and treated differently than other phrases in NLP applications./L//L/Various aspects of terms are revealed by many automatic term recognition (ATR) studies, but in most cases these aspects are not explicitly stated but rather just implied. Evaluation and comparison of related studies is therefore rather difficult. Another problem for an appropriate comparison of different approaches are the many definitions of what is called "multi-word term" in this paper. /L//L/Theoretical work concerning various characterizations of terms is badly needed. We can only come to conclusions about the validity and effectiveness of ATR results, their underlying theory and the methodology applied after examination and clarification of questions like: "What exactly is a term? How should variation like hyphenation and abbreviation be treated? To what extent does the context of a sequence of words influence the meaning of that sequence and therefore make it a common phrase or a non-compositional term?"./L//L/Another aspect that makes the development of good retrieval tools especially important is that Zipf’s law can also applies to multi-word terms. Acquisition of the many infrequent terms is a recognized but still unsolved problem.
Type: Individual Paper
Status: Completed
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page