Editor for this issue: <>
I got many responses to my request for string-manipulation programs for the Macintosh. I also discovered that a similar query was posted by Loren Billings on the CORPORA list. The following is a compilation of the responses to both requests (Loren will post the summary to CORPORA). I am posting this summary now, quoting liberally the basic info and evaluations of the respondents. Since I am less compute-literate than the respondents, please correct any errors that have crept into my summaries. More details can probably be had from the respondents I've named after each entry. Thanks to all of the following for their information! --Bill Croft Evan L. Antworth (evan.antworthMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueSIL.ORG) Cathy Ball (CBALL
guvax.acc.georgetown.edu) Michael Barlow (barlow
ruf.rice.edu) Loren Allen Billings (BILLINGS
pucc.Princeton.EDU) Chris Culy (cculy
uiowa.edu) Andrew E. Dolbey (dolbey
uclink.berkeley.edu) Sebastian Adorjan Dyhr (LINSAD
stud.hum.aau.dk) George Fowler (GFOWLER
ucs.indiana.edu) Larry Gorbet (lgorbet
mail.unm.edu) John Henderson (jkh
uniwa.uwa.edu.au) Ken Hughes (hughes
unixg.ubc.ca) Dirk Janssen (U249009
VM.UCI.KUN.NL) Michael Kelly (kelly
cattell.psych.upenn.edu) John Kirk (J.Kirk
qub.ac.uk) John E. Koontz (koontz
alpha.bldr.nist.gov) Barbara Levergood (leveb
ruby.ils.unc.edu) Hugh Nicoll (hnicoll
funatsuka.miyazaki-mu.ac.jp) Alain Polguere (ellalain
leonis.nus.sg) Malcolm Ross (mdr412
coombs.anu.edu.au) Achim Stein (achim
chianti.philosophie.uni-stuttgart.de) Theo Vosse (vosse
ruls41.LEIDENUNIV.NL) Bill Westaley (westaley
OREGON.UOREGON.EDU) 0. TABLE OF CONTENTS I. Tools associated with word processors (1) Alpha (2) BBedit (3) Nisus (4) emacs II. Tools associated with HyperCard (5) MonoConc (6) FreeText (7) XFCN (8) Folio Views III. Tools associated with database programs (9) 4th Dimension IV. Stand-alone tools (10) Conc (11) Concorder (12) MacGawk (13) grep/agrep in MacMint (14) Search Files 1.3 (15) MicroConcord (DOS) (16) TACT (DOS) V. Programming languages (17) MacPerl (18) MaxSPITBOL (19) Icon and ProIcon I. TOOLS ASSOCIATED WITH WORD PROCESSORS 1. Many modern word/text processors have grep (e.g. Nisus, BBedit). (Chris Culy) (1) Alpha [adapted from Ken Hughes' discussion of Alpha, BBedit and emacs--BC] I would recommend one of three programmer's editors that seem to dominate the environment these days...Alpha (v5.81?) can be found in most Mac archives. All have implementations of grep (full 'regular expression' use) for search and replace functions. I don't believe that there is any choice for doing serious work that doesn't involve regular expressions since these allow work across linefeed, carriage return, whitespace, and other 'punctuation' boundaries and text variances. All allow for operations on multiple files. All three editors are highly configurable and permit sophisticated macros. Alpha [and emacs] provide a command-line shell within a dandy window/buffer interface, but shell functions are limited and the built-in grep can't be piped to other machinery. Operations on files require programming skills. Ideally, an implementation of perl or awk which outputs to a file should satisfy pretty much any desire. Consequently, the only real 'solution' available so far may be MacPerl, using Alpha as a front end. (I haven't tried it yet. [see (17) below--BC]) Info can be found at "http://web.nexor.co.uk/mak/mak.html". (Ken Hughes) (2) BBedit Bare Bones Software has just released BBEdit 3.0: an elegant little program well worth looking at. The freeware BBEdit Lite 3.0 and the demo version of the full(commercial) program are available at info-mac mirror sites. The commercial version is $99. For more info contact < bbedit
world.std.com > (Hugh Nicoll) BBEdit is perhaps the most popular and easiest to use. BBEdit is widely loved. (Ken Hughes [see (1) Alpha for more description of BBEdit's capabilities--BC]) (3) Nisus The word processor Nisus has an extensive GREP type find/replace setup as well as a programming language which allows you to work up some fairly sophisticated tools. One of its strengths is that because you are operating on open text documents it is very easy to check each step as you go. The latest release NisusWriter v4 has just been released. There is also an e-mail list for Nisus users (contact pfterry
msmail.kgs.ukans.edu) and an ftp site at syrinx.kgs.ukans.edu (/home/ftp/nisus). (John Henderson) My main word processing tool is Nisus, and probably the main reason I use it is its grep-like string manipulation tools. Actually, there's a pretty standard grep facility (as part of its Find/Replace facility) and an "easy grep", a language that uses a much more transparent formalism to do about the same things. Actually, the latter is a nice tool for learning the former: you can type something in that, then click a button and have its grep translation appear. (Larry Gorbet) Nisus 3 allows sophisticated wildcard searches. It can also be used to find all cases of the seach object simultaneously: it selects all of them, so that they can all be copied to another file. Nisus Writer 4 has just been released, but I am still waiting for my copy. (Malcolm Ross) (4) emacs There is the Unix-like implementation of emacs (new to the Mac environment, v1.14?)... emacs is at "ftp.cs.cornell.edu" in the directory "pub/parmet". ... emacs provides a command-line shell within a dandy window/buffer interface, but shell functions are limited and the built-in grep can't be piped to other machinery. Operations on files require programming skills. Ideally, an implementation of perl or awk which outputs to a file should satisfy pretty much any desire. ... I have a personal penchant for emacs, but I have a lot of experience with it under Unix. It is extremely (!) feature-rich but the learning curve is a little stiff. Installing all the files requires about 7mb. I'm hoping that someone will soon write an implementation of awk and perl which will work within the emacs shell. (Ken Hughes [see (1) Alpha for more description of emacs' capabilities--BC]) II. TOOLS ASSOCIATED WITH HYPERCARD (5) MonoConc I have a HyperCard stack, MonoConc, which will give a KWIC concordance for a text. It will do left and right sorts and allow the search results to be saved as a file or printed. It is pretty basic. This program is really just a modification of ParaConc [another program that Michael Barlow described in an earlier LINGUIST posting], which works with parallel texts. Again the program gives a KWIC concordance for a keyword (e.g., "say" or "say*") and allows sorting etc. In addition, the sentences from the second language containing the equivalent of the keyword (in the first language) are displayed. The HyperCard programs are now at version 0.9x and I can email the program in binhex form to any interested linguists. (My email is barlow
ruf.rice.edu) In the future I will place these on an ftp site. If vvvpeople need a disk and a manual, they should contact Athelstan. We can send a copy for $10. (Athelstan -- 800-598-3880 in the USA) (Michael Barlow) (6) FreeText. If you want a simple fast concordance program that isn't too capable on nonIE languages you could try FreeText. It runs under HyperCard using externals and is very fast. The main limitation is how it handles characters - it turns everything into caps. If you can live with the limitation then its great. You can also do some modification since some of it is in HyperCard. FreeText is free. (Bill Westaley) There is a HyperCard program called AnyText from Linguist's Software that, I believe, does proximity searching. It is based on a freeware program called FreeText or something (sorry for being vague). (Evan Antworth) Also FreeText Browser (a HyperCard stack) allows you to do Boolean searches, but it doesn't have any print capabilities. (Cathy Ball) (7) XFCN There is a grep search/replace XFCN for HyperCard. (it's free.) (Chris Culy) (8) Folio Views The Hypertext Programme may be `Folio Views' for Macintosh [this is in reference to Loren's request--BC] Folio Corporation 2155 North Freedom Boulevard Suite 150 Provo, Utah 84604 Distributor: e.g. GVPi (Global Village Publishing Inc.) 1101 Kinsg St., STE 190, Alexandria, VA 22314 Call 1-800-394-GVPi (Achim Stein) III. TOOLS ASSOCIATED WITH DATABASE PROGRAMS (9) 4th Dimension I use a database system called 4th Dimension (by ACI) which has very powerful string manipulation capabilities, if you don't mind writing a little bit of Pascal-like code. 4D's programming language is powerful, flexible and relatively easy to learn, and is thus a nice choice for amateurs like myself. However, it does require that you break the text into "alpha-numeric" fields of a limited size rather than "text fields" (which can be much larger), because many of the commands operate only on the "alpha-numeric" fields, not on "text" fields. Another disadvantage is that 4AD is extremely expensive -- as of last year, it listed at $600. But it's just about the best database system you'll get for the Mac, at least in my opinion. (Andrew Dolbey) IV. STAND-ALONE PROGRAMS (10) Conc Attached is information on Conc. Conc is primarily a "keyword in context"-type concordancer. What you want is sometimes called proximity searching. It is possible to get Conc to do something close to what you want using a GREP search, but it's a bit clumsy. Conc: a concordance generator for the Macintosh Conc produces concordances of texts. A concordance consists of a list of the words in the text with a short section of the context that precedes and follows each word. Conc also produces an index, consisting of a list of the distinct words in the text, each with the number of times it occurs and a list of the places where it occurs. Conc displays the original text, the concordance, and the index each in its own window. Clicking on a word in any one of the three windows causes the other two windows to display the entries for the same word. Conc permits the user to define the sorting order and to limit the concordance to words that match specified patterns (GREP expressions). Conc will do concordances both on ordinary flat text files and also on multiple-line interlinear texts. In the case of interlinear texts, the concordance can be limited to selected lines (fields). In addition to word concording, Conc can also produce a concordance of each letter in a text or body of phonological data. Pattern-matching facilities are also available to letter concordances, so the user can specify search patterns that will have the effect of retrieving, say, words containing intervocalic obstruents. Concordances can be both printed and exported to plain text files. As for performance, on a Mac IIci Conc can produce a concordance of Moby Dick (1,177KB) in about 13 minutes and requires about 2,500KB of memory. Conc version 1.76 is a beta test version offered as 'freeware'. If you use it, we only ask that you send us your comments, complaints, and wishlist. You can affect the shape of the final product! Documentation is included on-disk in a Microsoft Word file. Conc 1.76 is available in any of three ways: 1. Conc can be downloaded by anonymous FTP from ftp.sil.org [198.213.4.1]. Do these commands: cd [.software.mac] get conc176.sea_hqx You will need a Binhex program to decode it. 2. Conc can be retrieved via e-mail. Send a message to mailserv
sil.org consisting of this single line only: send [ftp.software.mac]conc176.sea_hqx You will need a Binhex program to decode it. 3. Conc can be ordered on disk from: International Academic Bookstore 7500 W. Camp Wisdom Road Dallas, TX 75236 U.S.A. phone: 214/709-2404 fax: 214/709-2433 e-mail: Academic.Books
sil.org Cost is $5 plus postage. (Checks *must* be drawn on a U.S. bank. They do not accept credit cards, but will bill by invoice.) (Evan Antworth; thanks also to Bill Westaley and Theo Vosse) (11) Concorder Concorder is a simple concordance program for the Mac which does not have sophisticated searching (so if you want to specify the distance between search items it will not do) but for extraction of lines which contain two items it should work. "Concorder - Concordance software for the MacIntosh available from: Les publications CRM Universitie de Montreal C.P. 6128-A Montreal, Quebec H3C3J7 Canada Cost CAN$100 + $3 shipping one of the authors: David W. Rand rand
ere.umontreal.ca (Laura Proctor) (12) MacGawk There is a version of awk (GNU awk or gawk, actually) for the Mac, called, of course, MacGawk. (John Koontz) [John Koontz also sent the README file for MacGawk patch 4, which I have excerpted here:] About GNU awk for the Macintosh... This is GNU awk, gawk, for the Macintosh. For those who don't know, GNU stands for GNU's Not UNIX, an as-yet unfinished operating system,and is the primary goal of the Free Software Foundation. The FSF has publically condemned Apple Computer for its litigation in defense of perceived copyrights. The FSF, therefore, has no knowledge of the existence of this gawk version, and would not support it if it did. Do not report bugs or make any other contact with FSF concerning Macintosh gawk. Why Macintosh gawk exists gawk for the Macintosh exists for a number of reasons. First, I use gawk extensively as part of my day to day work activities and wanted to have it at home. Second, I was looking for a project in C to work on at home to learn Mac programming. And third, it was a challenge. I have every intention of following the GNU copyleft, meaning that I can not sell gawk itself ( I could conceivably charge for support) for profit and must also make full source available. Macintosh gawk is Free Software I do not charge for gawk. It is free software, not shareware or public domain. I encourage you to read the documents that describe the GNU Public License, or GPL so that you understand what this means. Differences from UNIX gawk Macinstosh gawk lacks some features that UNIX-like systems provide. These features include pipes and multiple processes. Mac gawk will quit when source programs invoke these functions. I caution against redirecting input and output in getline and print/printf calls. All other features should work the same. Read the Macintosh Supplement.mw document for details. Macintosh caveats Multifinder Mac gawk will run under Multifinder, but is not particularly MF adapted. It is set to use a partition size of 768K but large input files may require more, much more. Operation under Finder should be fine. Command Line Macintosh gawk uses the THINK C ccommand interface. This provides a dialog box that allows the user to enter UNIX shell-like command lines. Redirection of input and output is done with radio buttons. TEXT Files Mac gawk reads and writes standard Macintosh TEXT files. To use word processor files, it will be necessary to save them as TEXT first. Behind the scenes Compilation Mac gawk was compiled using THINK C 4.0.2 on a 4M Mac+ running System 6.0.7. gawk requires bison to generate the awk.tab.c file. This is generally only required when making changes in the actual awk language. The source files were converted to comply with the ANSI standard ( as THINK defines it) and makes full use of function prototypes. The author I'm not really the author, I just did the porting. My name is Tom Maszerowski, I work as a software engineer for Moscom, Inc. in Pittsford, NY. Moscom is nice enough to allow me email and UUCP acccess and I thank them, but there are no guarantees. Thanks to my wife as well, for allowing me the time at home to do this. Bugs and updates Please do not contact the FSF concerning this version of gawk. I expect to be the sole point of contact for bugs and source code updates. I monitor the GNU groups on NETNEWS and will try to incorporate them as needed. If you make changes to the gawk source you feel will benefit others send them to me. Addresses I can be reached at the following email addresses: tcm
moscom.com {rit,tropix,ur-valhalla}!moscom!tcm Mail delivery is usually quite good and I try to respond in a timely fashion ( although timely is a subjective term). Manual The manual with a supplement is found in the "gawk Manual.mw" and "Macintosh Supplement.mw" files. The manual was made by converting the original texinfo file to to nroff and then to text. The text was then converted to MacWrite5.0 format for the release. I tried to keep pagination correct but this may change based on the device you print to. Directions The source should be stripped of non-Mac code, since it seems unlikely that someone without a Mac would grab this source code. Memory allocation problems should be fixed, possibly with replacement of the THINK malloc() calls with something else. Real Mac interface would be nice, possibly using Prototyper ( this presents a problem with source distribution since Prototyper-produced code requires libraries that cannot be given away). (John Koontz) [The awk textbook is:] The awk programming manual, by A. Aho, B. Kernighan and P. Weinberger Addison-Wesley 1988, isbn 0-201-07981-X The source is on ftp.funet.fi, in the directory /mac/utils. I want to stress again that Awk is very versatile but also very limited: Your strings have to be formatted on 'hard' lines, ie. at the end of a line there is a CR (CRLF) sign. There may be more strings on a line, but your line may not be longer than about 250 characters. This is a very non-maccy restriction :-) (Dirk Janssen) There have been standalone implementations of grep and awk for the Mac but I haven't found any of these to be usefully standard or reliable. (Ken Hughes) (13) grep/agrep in MacMint There is a free Unix-clone available, and all the tools you'd expect are either already ported or can be recompiled using the GNU C compiler. One tool in particular is agrep, which is faster than any of the other greps, and which allows approximate matches. The clone is called MacMint, and it is a port of Mint for the Atari. The starter kit is available from Info-Mac mirrors (no compiler necessary to get set up). It takes a little work to get it set up, but I've found it to be very stable, and it's what I use for a lot of stuff. (Chris Culy) (14) Search Files 1.3 I recommend Search Files 1.3 by Robert Morris. A shareware that you can download from anywhere. It's some sort of sophisticated grep for the Mac. Gives you an output that looks like simple concordances. Very basic but very good. I am sure you'll like it. (Alain Polguerre) (15) MicroConcord (DOS) Don't forget that you can run all the IBM PC stuff under SoftPC. A simple concordancer that will look for collocations (to the left or the right, within a specifid 'horizon' of N words) is MicroConcord, published by Oxford University Press. It's for the PC, but I run it under SoftPC. (Cathy Ball) I am also the source in the U.S. (as Athelstan -- 800-598-3880 in the USA) for Oxford University Press's DOS program, MicroConcord, which is pretty much the standard commercial concordance program. (Michael Barlow) (16) TACT (DOS) TACT (PC, from University of Toronto) is quite a popular 'research' concordancer, too. (Cathy Ball) One possibility would be to use DOS packages on a PowerPC and copy the output - use Micro-OCP or TACT for instance. I'm just experimenting myself with this very process. (John Kirk) V. PROGRAMMING LANGUAGES (17) MacPerl A macintosh version of the UNIX perl language is available in the public domain. It's called "MacPerl." It's not as powerful as the Unix rendition (it doesn't allow for file expansions using ? or *). Nor does it run in the background. But it has an extremely flexible set of regular expressions and is a full programming language. There's also a good introductory book available called "Learning Perl" published by O'Reilly. Although based on the UNIX version, it applies by and large to MacPerl as well. Here are some ftp locations for macPerl: ftp://ftp.cis.ufl.edu/pub/perl/src/macperl ftp://ftp.eunet.ch/software/mac/perl ftp://ftp.funet.fi/pub/languages/perl/ports/perl4/mac ftp://src.doc.ic.ac.uk /packages/mac/umich/development/languages/macperl4.13.sit.hqx.gz America OnLine Mac Development Forum (keyword: mdv) (Michael Kelly) Operations on files require programming skills. Ideally, an implementation of perl or awk which outputs to a file should satisfy pretty much any desire. Consequently, the only real 'solution' available so far may be MacPerl, using Alpha as a front end. (I haven't tried it yet.) Info can be found at "http://web.nexor.co.uk/mak/mak.html". (Ken Hughes) (18) MaxSPITBOL I can strongly recommend MaxSPITBOL, an implementation of SNOBOL4 for the Mac. It is (was?) available from: Catspaw, Inc. P.O. Box 1123 Salida, CO 81201 719-539-3884 When I have called before, I have talked to the programmer, Mark Emmer, who has had a real committment to SNOBOL. He cheerfully answered all my questions before I invested in the software and helped me debug a couple of times when I ran into big dead ends. MaxSPITBOL works on text files, but the nice thing (for me anyway) was that I could use any font I want. This way, I could keep my database in the linguistics font, and then run a SPITBOL program on it, and get output in that same font. SPITBOL includes a very very powerful string manipulation language and pattern matching language, much more powerful than grep. I have programmed in SNOBOL/SPITBOL for many years, so I can't comment on the learning curve for a new user. I imagine it would be like learning any new programming language, except that the pattern matching syntax and semantics is fairly complex to learn in depth. Of course, the simpler tasks are simpler to learn and program. I don't remember what I paid for the program, but I imagine it was $150-$200. It was worth every penny to me. (Barbara Levergood; also recommended by George Fowler) (19) Icon and ProIcon There is the language ICON, which is like a better, more modern & object-oriented language, which has a public domain Macintosh implementation. I don't know where I got it from, and I don't use it, but you could try doing an Archie search for "ICON". (George Fowler) If you can do your own simple programming, then the programming language ProIcon for the Mac does a good job of various kinds of search. I think it is now public domain, like most versions of Icon...You would need to obtain a manual (it is a published book). (Malcolm Ross) ProIcon is useful if you want to make complex changes throughout a large file which would take too long with a Nisus macro. (Malcolm Ross) Dept of Linguistics, U Manchester, Oxford Rd, Manchester M13 9PL, UK w.croft
manchester.ac.uk FAX: +44-61-275 3187 Phone: 275 3188