Editor for this issue: <>
A few weeks ago, I sent an inquiry to LINGUIST regarding optical character recognition software for MACs that can be trained to identify nonstandard characters. I received only one report of successful use of such software, but can also report about two new pieces of OCR software that are supposed to be coming out in the next few months. I am aware of three different pieces of OCR software currently available, though all of them are or will be available at two levels, a standard version and a professional version. Training for nonstandard characters is either not available or is rather primitive on standard versions, and really requires the professional versions (which are considerably more expensive). The three pieces of software are: 1) OmniPage 2) TextBridge 3) Read-It Only OmniPage currently has a professional version available, and I have one positive report, from Malcolm Ross: "I have just 'trained' OmniPage Professional for the Mac to read a number of phonetic symbols in order to scan a set of wordlists for Papuan languages. It works reasonably well, although I think the density of the photocopies that one puts into the scanner is fairly critical." I believe that it costs over $600 (US). Standard Textbridge is the least expensive OCR software that I am aware of (about $100) and it does have some training ability. A professional version (costing about $350) is scheduled to come out in late May. I myself have tried two pieces of OCR software, the standard TextBridge and a more sophisticated piece of software called Accu-Text that is no longer available but that was made by the same company that makes Textbridge, so I am hoping the new professional version will be at least as good as Accu-Text. My experience with standard TextBridge is that it was able to learn some nonstandard characters, but was not as good as Accu-Text and, crucially, one cannot save what it has learned from one session to another. The professional version of TextBridge will allow one to save what one has learned. The latest version of Read-It specifically removed trainability from the previous version, but the company that makes it says they will be coming out with a version later in the summer that will be trainable. I have not heard any reports from people who have used earlier trainable versions. The version without trainability costs about $350, and I assume that the forthcoming trainable version will be at least that expensive. It is clear that trainable OCR software is expensive. My own reaction is that I wish I knew if the new professional TextBridge will be as good as OmniPage Professional, since it will be a lot cheaper. On the other hand, if OmniPage is better, then if one can afford the professional TextBridge, the extra price for OmniPage is probably worth it. Matthew DryerMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue