LINGUIST List 6.630

Sun 30 Apr 1995

Sum: Optical Character Recognition

Editor for this issue: <>


Directory

  1. Matthew Dryer, Optical Character Recognition

Message 1: Optical Character Recognition

Date: Tue, 25 Apr 1995 22:25:45 Optical Character Recognition
From: Matthew Dryer <LINDRYERubvms.cc.buffalo.edu>
Subject: Optical Character Recognition


A few weeks ago, I sent an inquiry to LINGUIST regarding optical character
recognition software for MACs that can be trained to identify nonstandard
characters. I received only one report of successful use of such software,
but can also report about two new pieces of OCR software that are supposed to
be coming out in the next few months.

I am aware of three different pieces of OCR software currently available,
though all of them are or will be available at two levels, a standard version
and a professional version. Training for nonstandard characters is either not
available or is rather primitive on standard versions, and really requires the
professional versions (which are considerably more expensive).

The three pieces of software are:

1) OmniPage
2) TextBridge
3) Read-It

Only OmniPage currently has a professional version available, and I have one
positive report, from Malcolm Ross: "I have just 'trained' OmniPage
Professional for the Mac to read a number of phonetic symbols in order to scan
a set of wordlists for Papuan languages. It works reasonably well, although I
think the density of the photocopies that one puts into the scanner is fairly
critical." I believe that it costs over $600 (US).

Standard Textbridge is the least expensive OCR software that I am aware of
(about $100) and it does have some training ability. A professional version
(costing about $350) is scheduled to come out in late May. I myself have
tried two pieces of OCR software, the standard TextBridge and a more
sophisticated piece of software called Accu-Text that is no longer available
but that was made by the same company that makes Textbridge, so I am hoping
the new professional version will be at least as good as Accu-Text. My
experience with standard TextBridge is that it was able to learn some
nonstandard characters, but was not as good as Accu-Text and, crucially, one
cannot save what it has learned from one session to another. The professional
version of TextBridge will allow one to save what one has learned.

The latest version of Read-It specifically removed trainability from the
previous version, but the company that makes it says they will be coming out
with a version later in the summer that will be trainable. I have not heard
any reports from people who have used earlier trainable versions. The version
without trainability costs about $350, and I assume that the forthcoming
trainable version will be at least that expensive.

It is clear that trainable OCR software is expensive. My own reaction is that
I wish I knew if the new professional TextBridge will be as good as OmniPage
Professional, since it will be a lot cheaper. On the other hand, if OmniPage
is better, then if one can afford the professional TextBridge, the extra price
for OmniPage is probably worth it.

Matthew Dryer
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue