Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


Oxford Handbook of Corpus Phonology

Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

Offers the first detailed examination of corpus phonology and serves as a practical guide for researchers interested in compiling or using phonological corpora

New from Cambridge University Press!


The Languages of the Jews: A Sociolinguistic History

By Bernard Spolsky

A vivid commentary on Jewish survival and Jewish speech communities that will be enjoyed by the general reader, and is essential reading for students and researchers interested in the study of Middle Eastern languages, Jewish studies, and sociolinguistics.

New from Brill!


Indo-European Linguistics

New Open Access journal on Indo-European Linguistics is now available!

Summary Details

Query:   Sum: OCR software
Author:  David Beck
Submitter Email:  click here to access email
Linguistic LingField(s):   Computational Linguistics

Summary:   A couple of weeks back I posted a query about OCR software for the Mac
that is trainable enough to be useful to a linguist scanning Latin or
IPA-based non-English texts. Thanks to

Jakob Dempsey
Sarah Rilling
Michael Betsch
Andrew Arefiev
Marc Fryd
and Daniel Loehr

for their responses.

In the Mac world, it appears that the front-runner in this area is the
widely-available OmniPage programme from Caere Corporation
( for info). It is apparently trainable although
one respondent expressed some doubts about being able to train it to
handle more than a single special font. I should also mention that the
first sales rep I talked to previously about OmniPage seemed to think
that it might have trouble with the combinations of letters and
diacrits typical of IPA- based alphabets. However, the publicity
literature on the Web site seems to imply that it can be trained to
recognize combinations of separate characters and the last sales rep I
talked to seemed to think that there was no doubt that OmniPage could
do the job.

Jakob Dempsey also mentioned an "expensive Kurzweil product" for the
Mac, but I haven't heard anything further about this.

I also got two responses that mentioned Windows-based applications
that are highly trainable. One is a German product called OPTOPUS made
by a German company called Makrolog in Wiesbaden which is "exclusively
trainable"--that is, it needs to be trained from scratch and so can be
configured to any alphabet you like. The other is by a Russian company
called Bit Software (; their programme is called
FineReader and in addition to having a wide range of set alphabets for
langauges using both Latin and Cyrillic, they report having
sucessfully trained it to recognize Icelandic and Tibetan fonts).

David Beck

David Beck
Department of Linguistics
Sixth Floor, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario M5S 3H1
phone: (416) 978-4029
(416) 923-2394 (home)
FAX: (416) 971-2688

LL Issue: 8.1135
Date Posted: 04-Aug-1997
Original Query: Read original query


Sums main page