Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34328

Still Needed:

$40672

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

What is English? And Why Should We Care?

By: Tim William Machan

To find some answers Tim Machan explores the language's present and past, and looks ahead to its futures among the one and a half billion people who speak it. His search is fascinating and important, for definitions of English have influenced education and law in many countries and helped shape the identities of those who live in them.


New from Cambridge University Press!

ad

Medical Writing in Early Modern English

Edited by Irma Taavitsainen and Paivi Pahta

This volume provides a new perspective on the evolution of the special language of medicine, based on the electronic corpus of Early Modern English Medical Texts, containing over two million words of medical writing from 1500 to 1700.


Email this page
E-mail this page

Review of  Programming for Linguists


Reviewer: Anne Mahoney
Book Title: Programming for Linguists
Book Author: Michael Hammond
Publisher: Wiley-Blackwell
Linguistic Field(s): Computational Linguistics
Book Announcement: 14.1536

Discuss this Review
Help on Posting
Review:


Date: Wed, 28 May 2003 12:59:21 -0400
From: Anne Mahoney <amahoney@perseus.tufts.edu>
Subject: Review: Hammond (2003) Programming for Linguists: Perl for Language Researchers

Hammond, Michael (2003) Programming for Linguists: Perl for
Language Researchers. Blackwell Publishing.

Reviewer: Anne Mahoney, Tufts University

Programming for Linguists is an introduction to computer
programming using the Perl language, aimed at people who
work with language. Although Hammond seems to envision it
as a self-study guide, it would probably work better as a
course textbook. It is a generally sound introduction to
the language and to the notion of programming a computer.

Perl is a particularly nice language for text processing
because of its wealth of pattern-matching and
string-handling constructs. It is easy in Perl to say, for
example, "find all words that end in a vowel" or "replace
every occurrence of the word 'cat' with the word 'feline.'"
In addition, Perl is easy for beginners because it is
interpreted rather than compiled: one simply writes a
program and runs it, without explicitly having to turn it
into machine code. As Hammond points out (p. 2), Perl is
moreover available, and free, for every type of computer
system in current use. I therefore agree that Perl is a
good starting point for a linguist with a computational
problem.

Hammond's intended audience is "a naive reader who may know
nothing about programming" (p. ix). The reader who already
knows another programming language and wants to pick up Perl
will be better served by Wall et al. (2000) Hammond's naive
reader, however, is expected to understand how to install
software, how to use a text editor as distinct from a word
processor, and how files and directories work. Although
Hammond gives basic instructions on how to invoke an editor,
how to invoke the Perl interpreter, and how to display the
text of a Perl program, he leaves the reader helpless if
anything goes wrong. While the details of using a text
editor really are beyond the scope of the book, especially
if the reader could be using any of several computing
platforms, it is often the case that someone who has never
thought about programming before has also never had occasion
to use a text editor, change the path (set of directories
from which executable programs can automatically be found),
or install anything that requires configuration or
compilation. Although Hammond sensibly suggests that some
of these are "delicate tasks" and "you should seek
assistance before attempting them on your own if you've
never done this before" (p. 7), it would be useful to
provide more concrete information about where such
assistance might be available. A college- or
university-affiliated linguist may be able to ask the
school's "academic technology" group. If no such resource
is available, the reader will want a good book on the
relevant operating system, perhaps one introducing system
administration or development.

The first two chapters introduce Perl and how to create and
run a program. Chapters 3-7 cover the core features of the
language. Not every bit of Perl syntax is included, only
what beginners need to write basic programs. Each chapter
includes examples, which are also available from the
author's home page, http://www.u.arizona.edu/~hammond/
(p. x), and ends with a group of exercises, many of which
are variations on the example programs. The exercises are
all relatively easy, a few minutes' to half an hour's work;
there are no term projects or research questions here. They
provide practice on the language features introduced in the
text, and may help the reader figure out what kinds of
problems a computer program might help solve. The core
chapters introduce, in order, control statements, scalar
variables, and arrays; input and output, both at the user's
screen and to files; organizing programs into subroutines;
regular expressions; substitutions, sorting, and
tokenization. Examples grow increasingly elaborate,
including an English-to-Pig Latin translator.

Chapter 8, on HTML, talks about using Perl to generate or
parse HTML files. Chapter 9 is about CGI, the "Common
Gateway Interface" for web programming. Oddly, it does not
mention the commonly used CGI module, available from CPAN
(the Comprehensive Perl Archive Network,
http://www.cpan.org, discussed in appendix D), which
includes functions to do several of the things Hammond has
the reader do laboriously by hand, notably retrieving the
input to a CGI routine.

Four appendices round out the book. Appendix A mentions
object-oriented programming as it is done in Perl. While it
is appropriate to explain the odd syntax that object-style
modules may use (all those double colons and extra
pointers), this topic is otherwise rather more advanced than
the rest of the book. Appendix B discusses the Perl
implementation of the Tk toolkit for building graphical
interfaces. Finally, appendix C lists the basic "special
variables" built in to Perl, and appendix D gives a few
pointers to further information.

Any introductory programming textbook is necessarily its
readers' first initiation not only into the mechanics of
programming, but also into style. Here Hammond's
recommendations and examples are sometimes inappropriate,
and often unidiomatic for Perl. For example, on p. 49 he
suggests that programmers should avoid "command
condensation," by which he means using the output of one
routine as an argument to another, or more generally doing
more than one operation in a single step. He notes that
this technique produces shorter programs, but "it results in
far less clarity and should be avoided." (p. 50) The
alternative, however, is generally to introduce new
variables to hold intermediate results. This is also
confusing, as another programmer reading or working on the
code some time later must determine what happens to each of
those variables, and whether they are still relevant in some
later part of the code. In programming languages as in
natural languages, greater fluency makes it possible to read
longer "sentences" without getting confused. A first-year
Latin student might be thoroughly confounded by the sentence
of 60-odd words that begins Cicero's speech for Archias the
poet, but the experienced Latinist understands its sense,
its structure, and its sound effects. Similarly,
experienced programmers learn to use increasingly
complicated statements. (While in natural language
acquisition students can generally read more complex
sentences than they can accurately write, in programming
language acquisition the sequence is often the reverse,
because students rarely get practice in reading existing
code. This is unfortunate, however, as working programmers
spend much more time reading, documenting, and modifying
existing code than they do writing new code from scratch.)

Hammond points out that code should have comments (p. 48),
but the examples rarely do. He also notes that variable
names should give some information about the use of the
variable (p. 49); although most of the examples follow this
precept, there are occasional one-letter or otherwise
neutral names. He characterizes the ubiquitous Perl
"anonymous variables" as "one major threat to writing
easy-to-read programs" (p. 50), yet anyone who will be
working with Perl will run into them almost at once.
Anonymous (or "implicit") variables in Perl are supplied
from the context when a function requires an argument
which it is not given. They include the current main
input filehandle, the current record from a file being
read, and the current element of an array within a loop.

Real programs must be prepared for errors, especially if
they expect to receive any data from outside. Hammond
notes (p. 36) that it's always necessary to check whether
a program has successfully opened a file it intends to use,
and gives the standard idiom for doing so. The example
programs, however, merely complain that there has been
an error, without saying what error or on which file;
the information necessary to construct an informative
error message is relegated to a footnote (p. 45). Once
we get to regular expressions, in chapter 6, a series of
example programs allow the user to enter a regular expression
as input to the program. These expressions are then used
without any check on their validity (examples p. 80, 81,
85, etc.).

Although some aspects of programming have changed in the
last fifty years or so, the basic principles of good style
are much the same as ever. The style manual Kernighan and
Plaugher (1978) has really not been superseded; its key
style rules (including "Avoid temporary variables"; "Use the
good features of a language; avoid the bad ones"; "Make sure
all variables are initialized before use"; and so on) are as
relevant to object-oriented Perl as they were to Fortran and
PL/1. Hammond is an experienced enough programmer to know
this, as is clear from the programs he makes available on
his home page. Students may as well learn good habits from
the beginning, rather than being encouraged by the textbook
to be sloppy.

The book has little to say about either design or debugging.
Any non-trivial program should be sketched out first, before
the programmer starts writing code, to be sure nothing major
will be overlooked. Simply starting in to write without
first thinking about the structure of the program can lead
to using the wrong structure. How large a program is
non-trivial depends on experience; for the intended readers
of this book, the solutions to the exercises are not yet
trivial. Moreover, few programs are correct when first
written, and Hammond gives no suggestions about how to
determine why a program does not do what you think it
should. Perl does include a couple of tools: the "use
strict" pragma to enforce variable declarations, the "-w"
command line switch to enable warnings, and a debugger. New
programmers need to be reminded that everyone makes
mistakes, that programming mistakes are rarely disastrous
unless the program modifies a file or something else beyond
its own borders, and that there are systematic ways of
finding and fixing the mistakes that will happen.

The book is in general accurate and well-edited, but I found
a few errors or inaccuracies which might lead to a bit of
confusion. For example, on page 9, the Perl escape "\n" is
described as "an explicit return -- or newline"; they are
not the same thing. In a footnote on p. 29, the definition
of "prime number" is correct, but the example includes 1,
which is incorrect. On p. 57, the scope of a variable
defined as a loop index in a "for" or "foreach" statement is
the loop itself, not the block or routine that encloses the
loop. In the discussion of regular expressions, p. 82-83,
the pattern that is intended to contain a backslashed
vertical bar is twice printed with a space between the
backslash and the bar: for "\|" we have "\ |" instead. The
example on p. 87 misses the first match: the pattern /o.*s/
applied to "John loves Mary" will match "ohn loves" rather
than merely "oves". In the discussion of sorts, p. 105-106,
the text says you specify "an explicit sorting function" as
an argument to the standard sort routine. In fact, what you
specify is only a comparison function, which tells how to
determine if one item comes before another, not an entire
sort function. In the discussion of HTML, correct
terminology seems to be deliberately avoided: "escape
sequence" on p. 129 instead of the standard term "entity,"
"parameter" on p. 131 instead of the standard term
"attribute."

The code sample on p. 136-138 does not handle URLs with
directories, and will also fail on a relative link from a
page whose URL includes a filename. Although I did not test
all of the code, this is the only program in the book which
had visible errors: a commendable success rate.

The preface suggests that the audience for the book includes
not only linguists but "literary theorists" (p. ix, repeated
on p. 1). I assume Hammond means "literary scholars" here;
while literary theorists are unlikely to need computational
tools, many of us who work on literature -- applying
theories rather than creating them -- do have occasion to
program. Literary scholars might want to write programs for
stylometrics, collation and textual criticism, metrical
analyses, concordancing, and so on. In addition, knowledge
of programming greatly facilitates marking up a text for
other uses, for example turning a plain typed or scanned
text into TEI XML.

Overall, this is a sound book, with only a few questionable
recommendations and very few errors. It would make a good
foundation text for an introductory course on computational
linguistics or humanities computing, perhaps coupled with
something like Hockey (2001) to give the students some ideas
about what this new skill will allow them to do.


References

Hockey, Susan M. (2001). Electronic Texts in the Humanities:
Principles and Practice. Oxford.

Kernighan, Brian W., and P. J. Plaugher (1978). The
Elements of Programming Style, second edition. New York:
McGraw-Hill.

Wall, Larry, Tom Christiansen, and Randal L. Schwartz.
(2000) Programming Perl, third edition. Sebastopal, CA:
O'Reilly and Associates.


A


 
ABOUT THE REVIEWER:
ABOUT THE REVIEWER Anne Mahoney teaches in the department of classics at Tufts University and is the lead programmer at the Perseus Project there. Her research interests include Greek and Latin meter and poetics, ancient drama, and vocabulary.

Versions:
Format: Paperback
ISBN: 0631234349
ISBN-13: N/A
Pages: 232
Prices: U.K. £ 24.99
U.S. $ 39.95
 

Format: Hardback
ISBN: 0631234330
ISBN-13: N/A
Pages: 232
Prices: U.K. £ 60.00
U.S. $ 74.95