LINGUIST List 16.968

Wed Mar 30 2005

Disc: New: Re: 16.961: Grammar Checker

Editor for this issue: Naomi Fox <foxlinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.

Directory

        1.    Mike Maxwell, A Word to the unwise -- program's grammar checker


Message 1: A Word to the unwise -- program's grammar checker

Date: 30-Mar-2005
From: Mike Maxwell <maxwellldc.upenn.edu>
Subject: A Word to the unwise -- program's grammar checker


Fund Drive 2005 is now on! Visit http://linguistlist.org/donate.html to donate now!
In LL 16.961(http://linguistlist.org/issues/16/16-961.html), John Lawler
discusses a recent Seattle Post-Intelligencer article concerning the
reportedly bad performance of Microsoft's grammar checker. (I'd suggest
that before replying on this topic, you read the original article at
http://seattlepi.nwsource.com/business/217802_grammar28.asp).

As most of us know (but the business prof may who started the crusade may
not know), grammar checking is a tradeoff between recall (allowing all
grammatical sentences through, i.e. not flagging them) vs. precision
(flagging all ungrammatical sentences). And of course in the case of
grammar judgments, there are issues of inter-annotator agreement which make
it impossible to even agree on what is grammatical or not.

Having said this, it occurs to me that it would be great fun--and might
even advance the state of the art--to have a web site where sentences both
grammatical and un- could be posted, and the output of grammar checkers
displayed in an interlinear format. [BTW, take that last sentence and
parse it...] I suppose the way it would work is that you could download
the sentences, pass them through your favorite checker or parser, and send
the results back in some agreed-on format (perhaps XML) to the owners of
the site, who could post the results.

Hopefully no one would be tempted to cheat by adjusting their parser's
results. If need be, there could be safeguards against that.

I think such a site should use individual sentences, not whole paragraphs
or texts, because there is probably no grammar checker or parser around
that could flag errors at the paragraph level. Of course, it would be nice
to be proved wrong!

There should also be an annotation line giving a human-produced indication
of errors, possibly with levels of acceptability and/or inter-annotator
agreement indicated. This would serve as the standard against which the
machine-produced results would be judged.

As for where one would get the sentences to be tested, I'm sure any teacher
could provide lots of bad examples. And good sentences could be lifted
from lots of places.


Linguistic Field(s): Computational Linguistics
General Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue