LINGUIST List 14.1461

Wed May 21 2003

Qs: Data Collection; Machine Translation Texts

Editor for this issue: Naomi Fox <>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at


  1. Hale Isik, Data Collection Assistance
  2. D Elliott, Parallel texts for machine translation evaluation

Message 1: Data Collection Assistance

Date: Tue, 20 May 2003 08:25:07 +0000
From: Hale Isik <>
Subject: Data Collection Assistance

Dear Members, 

I am a graduate student and also work as a research assistant at the
Dept of FLE, METU, Turkey. I am currently writing my M.A thesis which
aims at exploring, in the most general sense, the impact of
self-guiding principles and culture values on communication and how
these are operationalized in the language we use in situationally
defined contexts in Turkish and English.

To carry out a cross-cultural analysis (with Turkish data), I need to
use data compiled from native speakers of English who are university
students and citizens of the UK or USA.

I would be overwhelmingly grateful if you could direct such students
enrolled at your university/department/course to fill out my online
questionnaire which can be accessed via:
Please accept my sincere thanks and gratitude for your anticipated
support in advance.

Hale Isik

Research Assistant
Department of Foreign Language Education
Middle East Technical University, Ankara, Turkey

Subject-Language: English; Code: ENG 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Parallel texts for machine translation evaluation

Date: Wed, 21 May 2003 13:33:07 +0100 (BST)
From: D Elliott <>
Subject: Parallel texts for machine translation evaluation

Dear all

I am collecting parallel texts for a corpus designed specifically for
MT evaluation (to be made available online for research) and would
appreciate any advice on where to find parallel texts of a particular

Source texts/extracts of approx. 400 words each in: French, Italian,
German, Spanish, Chinese (Simplified and/or Traditional), Japanese,
Russian and Portuguese.

The challenge is that these must have very good quality human English
translations which can be used as a 'gold standard' against which we
can compare MT output. (NB British English if possible) I am just
beginning to realise how difficult a task I have set myself! (Another
problem is that some multilingual sites are localised to such an
extent that parts have been rewritten rather than translated - doh!)

The kinds of texts in the corpus will represent current MT use. The
following (provisional) categories have been selected, following a
worldwide survey of MT users:

Technical documents (eg. software user manuals, online help, telecoms,
automotive, aerospace)
Correspondence (letter/emails)
Academic papers
Tourist/travel information
Newspaper articles
Medical documents
Scientific documents
Financial documents (stock exchange reports, banking, insurance)
Legal documents (including patents)
Calls for tender
Internal company documents (eg. minutes, training material, company

Any URLs or other sources (even on paper!) would be gratefully
received. Sources which do not require copyright permission would
also be a big time-saver. All sources will obviously be acknowledged
in the corpus.

I will post a summary of feedback as soon as the deluge stops (wishful

Debbie Elliott

For more information on the project so far, see: 

Elliott, Debbie; Hartley, Anthony; Atwell, Eric. Rationale for a
multilingual corpus for machine translation evaluation in: Archer, D,
Rayson, P, Wilson, A & McEnery, T (editors) Proceedings of CL2003:
International Conference on Corpus Linguistics, pp. 191-200 Lancaster
University. 2003.

Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue