LINGUIST List 8.1135

Mon Aug 4 1997

Sum: OCR Software

Editor for this issue: Martin Jacobsen <martylinguistlist.org>


Directory

  1. David Beck, Sum: OCR software

Message 1: Sum: OCR software

Date: Mon, 4 Aug 1997 16:15:02 -0400
From: David Beck <dbeckchass.utoronto.ca>
Subject: Sum: OCR software


A couple of weeks back I posted a query about OCR software for the Mac
that is trainable enough to be useful to a linguist scanning Latin or
IPA-based non-English texts. Thanks to

 Jakob Dempsey
 Sarah Rilling
 Michael Betsch
 Andrew Arefiev
 Marc Fryd
and Daniel Loehr

for their responses.

In the Mac world, it appears that the front-runner in this area is the
widely-available OmniPage programme from Caere Corporation
(http://www.caere.com for info). It is apparently trainable although
one respondent expressed some doubts about being able to train it to
handle more than a single special font. I should also mention that the
first sales rep I talked to previously about OmniPage seemed to think
that it might have trouble with the combinations of letters and
diacrits typical of IPA- based alphabets. However, the publicity
literature on the Web site seems to imply that it can be trained to
recognize combinations of separate characters and the last sales rep I
talked to seemed to think that there was no doubt that OmniPage could
do the job.

Jakob Dempsey also mentioned an "expensive Kurzweil product" for the
Mac, but I haven't heard anything further about this.

I also got two responses that mentioned Windows-based applications
that are highly trainable. One is a German product called OPTOPUS made
by a German company called Makrolog in Wiesbaden which is "exclusively
trainable"--that is, it needs to be trained from scratch and so can be
configured to any alphabet you like. The other is by a Russian company
called Bit Software (www.bitsoft.ru); their programme is called
FineReader and in addition to having a wide range of set alphabets for
langauges using both Latin and Cyrillic, they report having
sucessfully trained it to recognize Icelandic and Tibetan fonts).

David Beck

======================================================================
David Beck
Department of Linguistics
Sixth Floor, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario M5S 3H1
Canada
e-mail: dbeckchass.utoronto.ca
phone: (416) 978-4029
 (416) 923-2394 (home)
FAX: (416) 971-2688
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue