Editor for this issue: <>
Five weeks ago, I asked the LINGUIST audience about Grammar/Parsing
components for English language. This is a brief summary of
the answers that I've received.
There have been six responses, five by email and one by
conventional letter post, covering a total of four different
tools. One tool collection ("The Alvey Natural Language Tools")
was mentioned in three responses.
Thanks a lot for all of those who helped.
In the sequel, I give a summarizing description of the tool features.
The tools are described in alphabetical order.
(1) Alvey Tools
TOOL NAME
The Alvey Natural Language Tools
DISTRIBUTOR/DEVELOPERS
Lynxvale WCIU Programs
20 Trumpington St.
Cambridge, CB2 1QA, UK
Fax: +223 332797
developed by: the Universities of Cambridge, Edinburgh and Lancaster
DESCRIPTION OF COMPONENTS
Lexicon: 63000 entries (40000 homonyms) for English.
Morphological analyzer for English (adaptable "for
most European languages and many others").
Grammar: "wide coverage syntactic and semantic grammar of
English, written in a metagrammatical formalism
derived from HPSG." The semantic rules are given
by formulas of the lambda calculus.
Grammar Development Environment: including editor, browser,
debugger for grammar Ddvelopment.
COVERAGE
(no precise statements available)
IMPLEMENTATION FEATURES
Language: Common LISP.
Space Requirements: 13 Megabyte disc space.
Available for: Sun 3, Sun 4 / SPARC, HP 9000/3nn running HP-UX,
DECStation 3000 or 5000 running Ultrix, and "any other type
of UNIX machine".
Available by: anonymous ftp from ftp.cl.cam.ac.uk
(decription key upon payment of licence fee).
Documentation available free of charge.
Source code: available (?).
LICENCE CONDITIONS
Licence fee: 500 ECU, 100 ECU upgrade
Restrictions: usage for research purposes only.
(2) English Constraint Grammar (ENGCG)
TOOL NAME
English Constraint Grammar (ENGCG)
DISTRIBUTOR/DEVELOPERS
Dept. of General Linguistics
University of Helsinki
Hallituskatu 11-13
SF-00100 Helsinki
Finland
Fred Karlsson <karlsson
ling.helsinki.fi>
Atro Voutilainen <voutilainen
ling.helsinki.fi>
Juha Heikil"a <jheikkila
ling.helsinki.fi>
Arto Anttila <anttila
csli.stanford.edu>
Krister Linden <klinden
ling.helsinki.fi>
Bart Jongejan, CRI A/S, Denmark.
DESCRIPTION OF COMPONENTS
Main modules of the ENGCG:
Preprocessor:
- input normalisation
- sentence bounds determination
Morphological analyser:
- ENGTWOL lexicon: 56000 entries
(600 multi word units (idioms), 5500 compounds)
- the analyser program
Morphological heuristics module:
- for assigning morphological descriptions to
words not recognized by the morphological analyser
English Constraint Grammar:
- grammar for morphological disambiguation
-- 1100 constraints
-- 200 heuristic constraints for
resolving remainigntambiguities
- grammar for determining syntactic functions
-- 250 syntactic constraints for
syntactic ambiguity resolution
Constraint Grammar Parser: interprets the input according
to the morphological and syntactic constraints. Essentialy
a finite automaton that computes a "shallow dependency structure".
COVERAGE
Measured against four texts from different domains:
Lexicon: > 90 %
Morphological disambiguation:
- precision: > 95.5 % average (i.e. less then 4.5 %
of all words remain ambiguous)
- recall: > 99.7 % average (i.e. less then 0.3%
of all appropriate morphological readings are
not found)
- heuristics resolve aprox. 50% of all remaining
ambiguities
Mapping of words to grammatical functions: > 75 % unambiguous
< 4.5 % errors
IMPLEMENTATION FEATURES
Common Lisp implementations for Sun Sparcstations 2/20,
(3-4 words/second on SPARC 2)
C implementations
- academic version, written by Bart Jongejan,
CRI A/S, Denmark (15-25 words/second on SPARC 2)
- production version, written by Pasi Tapanainen
(20-25 times faster)
Source code: not available (?)
LICENCE CONDITIONS
Sublicence for non-commercial academic research purposes:
1500 US $. Contact Atro Voutilainen or Fred Karlsson.
For non-academic use: contact Krister Linden.
(3) English Slot Grammar
TOOL NAME
English Slot Grammar
DISTRIBUTOR/DEVELOPERS
IBM T.J. Watson Research Center,
Michael McCord <mccord
watson.ibm.com>
DESCRIPTION OF COMPONENTS
Lexicon: 60000 English stems.
Grammar: ESG. Head oriented grammar (for English Language)
that delivers parse structures showing both surface structure
and deep structure in single trees.
Includes treatment of: coordination, unbound dependencies,
passive unwinding.
Syntactic analyzer: bottom up chart parser.
COVERAGE
(no precise statements available)
IMPLEMENTATION FEATURES
The development of slot grammars for German, Spanish,
Danish, Norwegian, Hebrew is in progress.
LICENCE CONDITIONS
contact Michael McCord
(4) PUNDIT
TOOL NAME
PUNDIT
DISTRIBUTOR/DEVELOPERS
Debbie Dahl
Paramax Systems Corporation
P.O. Box 517
Paoli, PA 19301
USA
(215) 648-2027
dahl
vfl.paramax.com
DESCRIPTION OF COMPONENTS
Based on string grammar formalism
Covers syntax, semantics and pragmatics
Lexicon of 2000 words (extensible)
COVERAGE
(no precise statements available)
IMPLEMENTATION FEATURES
Implemented in Quintos Prolog 3.2.
Source code availabe.
LICENCE CONDITIONS
Licence for research purposes: free (restrictions apply)
Licence for commercial use: to be negotiated.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue