LINGUIST List 11.2363

Tue Oct 31 2000

TOC: International Journal of Corpus Linguistics

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>


Directory

  1. Paul Peranteau, International Journal of Corpus Linguistics 5:1, 2000

Message 1: International Journal of Corpus Linguistics 5:1, 2000

Date: Tue, 31 Oct 2000 11:58:25 -0500
From: Paul Peranteau <paulbenjamins.com>
Subject: International Journal of Corpus Linguistics 5:1, 2000

International Journal of Corpus Linguistics 5:1 (2000)

� John Benjamins Publishing Company

Editorial (iii)

ARTICLES

Tam�s V�radi (1)
Fishing for Translation Equivalents Using Grammatical Anchors

Lynne Bowker (17)
Towards a Methodology for Exploiting Specialized Target Language Corpora as
Translation Resources

Geoffrey Sampson (53)
A Proposal for Improving the Measurement of Parse Accuracy

S. Mostafa Assi and M. Haji Abdolhosseini (69)
Grammatical Tagging of a Persian Corpus

Petek Kurtb�ke and Liz Potter (83)
Co-occurrence Tendencies of Loanwords in Corpora

Reviews (101)

Stig Johansson & Signe Oksefjell (eds.):Corpora and Cross-linguistic Research:
 Theory, Method, and Case Studies 
 (Gunter Lorenz)

Hans van Halteren: Excursions into Syntactic Databases (Simonetta Montemagni)

Hans Lindquist, Staffan Klintborg, Magnus Levin and Maria Estling (eds.):
The Major Varieties of English: Papers from MAVEN'97

Sidney Greenbaum (ed.): Comparing English Worldwide: The International
 Corpus of English (Susan M. Fitzmaurice)

Abstracts (117)

 ------------------------------------------------------------------------
ABSTRACTS:

Tam�s V�radi
Fishing for Translation Equivalents Using Grammatical Anchors

Bilingual parallel corpora offer a treasure house of human translator's
knowledge of the correspondences between the two languages. Extracting by
automatic means the translation equivalents deemed accurate and
contextually appropriate by a human translator is of great practical
importance for various fields such as example-based machine translation,
computational lexicography, information retrieval, etc. The task of word or
phrase level identification is greatly reduced if suitable anchor points
can be found in the stream of texts. It is suggested that grammatical
morphemes provide very useful clues to finding translation equivalents.
They typically form a closed set, occur frequently enough in sentences,
have more or less fixed meanings, and, most important, will stand in a
one-to-one or at most one-to-few relationship with corresponding elements
in the other language. This paper will explore the viability of the idea
with reference to the Hungarian and English versions of Plato's Republic,
which are available in sentence-aligned form. Hungarian has a rich set of
suffixes which are typically deployed in a concatenated manner.
Corresponding to them in English are prepositions, auxiliary words, and
suffixes. The paper will show how, by starting from a well defined set of
correspondences between Hungarian grammatical morphemes and their
equivalents and using a combination of pattern matching and heuristics, one
can arrive at a mapping of phrases between the two texts.

Lynne Bowker
Towards a Methodology for Exploiting Specialized Target Language Corpora as
Translation Resources

Specialized target language (TL) corpora constitute an extremely valuable
resource for translators, and although no specialized tools have been
developed for extracting translation data from such corpora, this paper
argues that translators would be remiss not to consult such resources. We
describe the advantages of using specialized TL corpora and outline a
number of techniques that translators can use in order to extract
translation data from such corpora with the aid of generic corpus analysis
tools. These advantages and techniques are demonstrated with reference to
two translations, one of which was done using only conventional resources
and the other with the help of a corpus.

Geoffrey Sampson
A Proposal for Improving the Measurement of Parse Accuracy

Widespread dissatisfaction has been expressed with the measure of parse
accuracy used in the Parseval programme, based on the location of
constituent boundaries. Scores on the Parseval metric are perceived as
poorly correlated with intuitive judgments of goodness of parse; the metric
applies only to a restricted range of grammar formalisms; and it is seen as
divorced from applications of NLP technology. The present paper defines an
alternative metric, which measures the accuracy with which successive words
are fitted into parsetrees. (The original statement of this metric is
believed to have been the earliest published proposal about quantifying
parse accuracy.) The metric defined here gives overall scores that quantify
intuitive concepts of good and bad parsing relatively directly, and it
gives scores for individual words which enable the location of parsing
errors to be pinpointed. It applies to a wider range of grammar formalisms,
and is tunable for specific parsing applications.

S. Mostafa Assi and M. Haji Abdolhosseini
Grammatical Tagging of a Persian Corpus

The purpose of this article is to briefly introduce an interactive POS
tagging system developed as a project at the Institute for Humanities and
Cultural Studies in Tehran, Iran. The system is designed as part of the
annotation procedure for a Persian corpus called The Farsi Linguistic
Database (FLDB) (a project at the Institute for Humanities and Cultural
Studies in Tehran which comprises a selection of contemporary Modern
Persian literature, formal and informal spoken varieties of the language,
and a series of dictionary entries and word lists [Assi 1997: 5].) and is
the first attempt ever to tag a Persian corpus. In Section 1, the project
itself will be introduced; Section 2 presents an evaluation of the project,
and Section 3 is allocated to some suggestions for future work.

Petek Kurtb�ke and Liz Potter
Co-occurrence Tendencies of Loanwords in Corpora

This paper investigates some major approaches to the analysis of foreign
material in text, commonly known as loanwords. While the nature of data may
differ in various fields of linguistic research (e.g., bilingual vs.
monolingual corpora), perspectives on the analysis of such material have
not been different, and they have traditionally been analysed as
singly-occurring items out of context. However, corpus research has shown
that words rarely occur in isolation. On the basis of a number of English
loans in a corpus of Turkish compiled in a multilingual setting, and a
number of Italian loans in a corpus of English compiled in a monolingual
setting, we conclude that collocational patterns growing around loanwords
are significant and should be included in the treatment of loanwords.




			John Benjamins Publishing Co.
Offices:	Philadelphia			Amsterdam:
Websites: 	http://www.benjamins.com	http://www.benjamins.nl
E-mail:		servicebenjamins.com		customer.servicesbenjamins.nl
Phone:		+215 836-1200			+31 20 6762325
Fax: 		+215 836-1204			+31 20 6739773
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue