Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more

Donate Now | Visit the Fund Drive Homepage

Amount Raised:


Still Needed:


Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington

Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


What is English? And Why Should We Care?

By: Tim William Machan

To find some answers Tim Machan explores the language's present and past, and looks ahead to its futures among the one and a half billion people who speak it. His search is fascinating and important, for definitions of English have influenced education and law in many countries and helped shape the identities of those who live in them.

New from Cambridge University Press!


Medical Writing in Early Modern English

Edited by Irma Taavitsainen and Paivi Pahta

This volume provides a new perspective on the evolution of the special language of medicine, based on the electronic corpus of Early Modern English Medical Texts, containing over two million words of medical writing from 1500 to 1700.

Summary Details

Query:   Sum: OCR software
Author:  David Beck
Submitter Email:  click here to access email
Linguistic LingField(s):   Computational Linguistics

Summary:   A couple of weeks back I posted a query about OCR software for the Mac
that is trainable enough to be useful to a linguist scanning Latin or
IPA-based non-English texts. Thanks to

Jakob Dempsey
Sarah Rilling
Michael Betsch
Andrew Arefiev
Marc Fryd
and Daniel Loehr

for their responses.

In the Mac world, it appears that the front-runner in this area is the
widely-available OmniPage programme from Caere Corporation
(http://www.caere.com for info). It is apparently trainable although
one respondent expressed some doubts about being able to train it to
handle more than a single special font. I should also mention that the
first sales rep I talked to previously about OmniPage seemed to think
that it might have trouble with the combinations of letters and
diacrits typical of IPA- based alphabets. However, the publicity
literature on the Web site seems to imply that it can be trained to
recognize combinations of separate characters and the last sales rep I
talked to seemed to think that there was no doubt that OmniPage could
do the job.

Jakob Dempsey also mentioned an "expensive Kurzweil product" for the
Mac, but I haven't heard anything further about this.

I also got two responses that mentioned Windows-based applications
that are highly trainable. One is a German product called OPTOPUS made
by a German company called Makrolog in Wiesbaden which is "exclusively
trainable"--that is, it needs to be trained from scratch and so can be
configured to any alphabet you like. The other is by a Russian company
called Bit Software (www.bitsoft.ru); their programme is called
FineReader and in addition to having a wide range of set alphabets for
langauges using both Latin and Cyrillic, they report having
sucessfully trained it to recognize Icelandic and Tibetan fonts).

David Beck

David Beck
Department of Linguistics
Sixth Floor, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario M5S 3H1
e-mail: dbeck@chass.utoronto.ca
phone: (416) 978-4029
(416) 923-2394 (home)
FAX: (416) 971-2688

LL Issue: 8.1135
Date Posted: 04-Aug-1997
Original Query: Read original query


Sums main page