LINGUIST List 15.3305
Thu Nov 25 2004
FYI: Software Localization; New BNC-related Corpus
Editor for this issue: Ann Sawyer <sawyer
linguistlist.org>
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
Directory
1. Donald
Osborn,
Issues in Software Localization
2. Mark
Davies,
New BNC-related Corpus: Register-based Queries
Message 1: Issues in Software Localization
Date: 25-Nov-2004
From: Donald Osborn <dzo
bisharat.net>
Subject: Issues in Software Localization
The localization of internet content and computer software to many
languages is an undeniable trend. With regard to software localization,
the translation of commands, interfaces, glossaries and documentation
into diverse languages raises interesting language questions as well as
the need for collaboration between linguists and software localizers.
In a recent event for outlining and planning for localization needs,
the Localisation Development Sprint (Warsaw, 11/20-22/04), language
and the specialty of linguistics were subjects often more implicit than
explicit. Nevertheless that event may be of interest, not only for the
ground it covered but also for links to other activities.
See http://localisationdev.org/
Don Osborn
Bisharat.net
Linguistic Field(s): Computational Linguistics; General Linguistics
Message 2: New BNC-related Corpus: Register-based Queries
Date: 22-Nov-2004
From: Mark Davies <mark_davies
byu.edu>
Subject: New BNC-related Corpus: Register-based Queries
I have placed on the web a freely-accessible resource that may be
of interest to some of you:
http://view.byu.edu
([V]ariation [I]n [E]nglish [W]ords and Phrases)
As with some other interfaces, this website allows you to quickly
and easily search the 100 million word British National Corpus.
Users can search by exact word or phrase, wildcard or part of speech,
or combinations of these (e.g. all nouns ending in -ness or all cases
of 'white' + [noun]).
Unlike some interfaces that are strictly 'slot-oriented', this interface
also allows you to use 'anchors' and 'targets' for fuzzy matches
(e.g all nouns somewhere near 'break' (v), adjectives near 'woman',
verbs near 'way', and nouns near 'small'), and the size of the window
can be easily customized.
Perhaps the most unique aspect of the corpus is the ability to find
the frequency of words and phrases in any combination of registers
that you define (spoken, academic, poetry, medical, etc). In addition,
you can compare between registers -- for example, verbs that are
more common in legal or medical texts, phrases like [I * that] that
are more common in conversation than in non-fiction texts, nouns
near 'break' (v) that are found primarily in academic writings, etc.
Finally, it should be noted that the database architecture of this
corpus improves on some previous interfaces, in that it allows
users to find *all* of the matching strings from the BNC, rather
than just those n-grams that occur three times or more in the
corpus (which effectively cuts out about 75% of all 2-gram and
3-gram strings). It's also quite fast -- just a couple of seconds
or less for nearly all searches -- including queries with detailed
register information.
If you have any questions, please feel free to email me.
Mark Davies
Assoc. Prof., Linguistics
Brigham Young University
http://davies-linguistics.byu.edu
** Corpus design and use // Web-database scripting **
** Historical linguistics // Functional-typological grammar **
** Variation in Spanish, Portuguese, and English syntax **
Linguistic Field(s): Applied Linguistics; Computational Linguistics; Discourse
Analysis; Lexicography; Text/Corpus Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue