Editor for this issue: Naomi Ogasawara <naomi
linguistlist.org>
International Journal of Corpus Linguistics 5:1 (2000) � John Benjamins Publishing Company Editorial (iii) ARTICLES Tam�s V�radi (1) Fishing for Translation Equivalents Using Grammatical Anchors Lynne Bowker (17) Towards a Methodology for Exploiting Specialized Target Language Corpora as Translation Resources Geoffrey Sampson (53) A Proposal for Improving the Measurement of Parse Accuracy S. Mostafa Assi and M. Haji Abdolhosseini (69) Grammatical Tagging of a Persian Corpus Petek Kurtb�ke and Liz Potter (83) Co-occurrence Tendencies of Loanwords in Corpora Reviews (101) Stig Johansson & Signe Oksefjell (eds.):Corpora and Cross-linguistic Research: Theory, Method, and Case Studies (Gunter Lorenz) Hans van Halteren: Excursions into Syntactic Databases (Simonetta Montemagni) Hans Lindquist, Staffan Klintborg, Magnus Levin and Maria Estling (eds.): The Major Varieties of English: Papers from MAVEN'97 Sidney Greenbaum (ed.): Comparing English Worldwide: The International Corpus of English (Susan M. Fitzmaurice) Abstracts (117) ------------------------------------------------------------------------ ABSTRACTS: Tam�s V�radi Fishing for Translation Equivalents Using Grammatical Anchors Bilingual parallel corpora offer a treasure house of human translator's knowledge of the correspondences between the two languages. Extracting by automatic means the translation equivalents deemed accurate and contextually appropriate by a human translator is of great practical importance for various fields such as example-based machine translation, computational lexicography, information retrieval, etc. The task of word or phrase level identification is greatly reduced if suitable anchor points can be found in the stream of texts. It is suggested that grammatical morphemes provide very useful clues to finding translation equivalents. They typically form a closed set, occur frequently enough in sentences, have more or less fixed meanings, and, most important, will stand in a one-to-one or at most one-to-few relationship with corresponding elements in the other language. This paper will explore the viability of the idea with reference to the Hungarian and English versions of Plato's Republic, which are available in sentence-aligned form. Hungarian has a rich set of suffixes which are typically deployed in a concatenated manner. Corresponding to them in English are prepositions, auxiliary words, and suffixes. The paper will show how, by starting from a well defined set of correspondences between Hungarian grammatical morphemes and their equivalents and using a combination of pattern matching and heuristics, one can arrive at a mapping of phrases between the two texts. Lynne Bowker Towards a Methodology for Exploiting Specialized Target Language Corpora as Translation Resources Specialized target language (TL) corpora constitute an extremely valuable resource for translators, and although no specialized tools have been developed for extracting translation data from such corpora, this paper argues that translators would be remiss not to consult such resources. We describe the advantages of using specialized TL corpora and outline a number of techniques that translators can use in order to extract translation data from such corpora with the aid of generic corpus analysis tools. These advantages and techniques are demonstrated with reference to two translations, one of which was done using only conventional resources and the other with the help of a corpus. Geoffrey Sampson A Proposal for Improving the Measurement of Parse Accuracy Widespread dissatisfaction has been expressed with the measure of parse accuracy used in the Parseval programme, based on the location of constituent boundaries. Scores on the Parseval metric are perceived as poorly correlated with intuitive judgments of goodness of parse; the metric applies only to a restricted range of grammar formalisms; and it is seen as divorced from applications of NLP technology. The present paper defines an alternative metric, which measures the accuracy with which successive words are fitted into parsetrees. (The original statement of this metric is believed to have been the earliest published proposal about quantifying parse accuracy.) The metric defined here gives overall scores that quantify intuitive concepts of good and bad parsing relatively directly, and it gives scores for individual words which enable the location of parsing errors to be pinpointed. It applies to a wider range of grammar formalisms, and is tunable for specific parsing applications. S. Mostafa Assi and M. Haji Abdolhosseini Grammatical Tagging of a Persian Corpus The purpose of this article is to briefly introduce an interactive POS tagging system developed as a project at the Institute for Humanities and Cultural Studies in Tehran, Iran. The system is designed as part of the annotation procedure for a Persian corpus called The Farsi Linguistic Database (FLDB) (a project at the Institute for Humanities and Cultural Studies in Tehran which comprises a selection of contemporary Modern Persian literature, formal and informal spoken varieties of the language, and a series of dictionary entries and word lists [Assi 1997: 5].) and is the first attempt ever to tag a Persian corpus. In Section 1, the project itself will be introduced; Section 2 presents an evaluation of the project, and Section 3 is allocated to some suggestions for future work. Petek Kurtb�ke and Liz Potter Co-occurrence Tendencies of Loanwords in Corpora This paper investigates some major approaches to the analysis of foreign material in text, commonly known as loanwords. While the nature of data may differ in various fields of linguistic research (e.g., bilingual vs. monolingual corpora), perspectives on the analysis of such material have not been different, and they have traditionally been analysed as singly-occurring items out of context. However, corpus research has shown that words rarely occur in isolation. On the basis of a number of English loans in a corpus of Turkish compiled in a multilingual setting, and a number of Italian loans in a corpus of English compiled in a monolingual setting, we conclude that collocational patterns growing around loanwords are significant and should be included in the treatment of loanwords. John Benjamins Publishing Co. Offices: Philadelphia Amsterdam: Websites: http://www.benjamins.com http://www.benjamins.nl E-mail: serviceMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuebenjamins.com customer.services
benjamins.nl Phone: +215 836-1200 +31 20 6762325 Fax: +215 836-1204 +31 20 6739773