LINGUIST List 4.363

Tue 11 May 1993

Sum: Grammars/Parsers for the English Language

Editor for this issue: <>


Directory

  1. Roland Stuckardt, SUMMARY: Grammars/Parsers for English Language

Message 1: SUMMARY: Grammars/Parsers for English Language

Date: Tue, 11 May 93 12:20:29 +0SUMMARY: Grammars/Parsers for English Language
From: Roland Stuckardt <stuckarddarmstadt.gmd.de>
Subject: SUMMARY: Grammars/Parsers for English Language


Five weeks ago, I asked the LINGUIST audience about Grammar/Parsing
components for English language. This is a brief summary of
the answers that I've received.

There have been six responses, five by email and one by
conventional letter post, covering a total of four different
tools. One tool collection ("The Alvey Natural Language Tools")
was mentioned in three responses.

Thanks a lot for all of those who helped.

In the sequel, I give a summarizing description of the tool features.
The tools are described in alphabetical order.

(1) Alvey Tools

TOOL NAME

 The Alvey Natural Language Tools

DISTRIBUTOR/DEVELOPERS

 Lynxvale WCIU Programs
 20 Trumpington St.
 Cambridge, CB2 1QA, UK
 Fax: +223 332797

 developed by: the Universities of Cambridge, Edinburgh and Lancaster

DESCRIPTION OF COMPONENTS

 Lexicon: 63000 entries (40000 homonyms) for English.

 Morphological analyzer for English (adaptable "for
 most European languages and many others").

 Grammar: "wide coverage syntactic and semantic grammar of
 English, written in a metagrammatical formalism
 derived from HPSG." The semantic rules are given
 by formulas of the lambda calculus.

 Grammar Development Environment: including editor, browser,
 debugger for grammar Ddvelopment.

COVERAGE

 (no precise statements available)

IMPLEMENTATION FEATURES

 Language: Common LISP.

 Space Requirements: 13 Megabyte disc space.

 Available for: Sun 3, Sun 4 / SPARC, HP 9000/3nn running HP-UX,
 DECStation 3000 or 5000 running Ultrix, and "any other type
 of UNIX machine".

 Available by: anonymous ftp from ftp.cl.cam.ac.uk
 (decription key upon payment of licence fee).
 Documentation available free of charge.

 Source code: available (?).

LICENCE CONDITIONS

 Licence fee: 500 ECU, 100 ECU upgrade

 Restrictions: usage for research purposes only.


(2) English Constraint Grammar (ENGCG)

TOOL NAME

 English Constraint Grammar (ENGCG)

DISTRIBUTOR/DEVELOPERS

 Dept. of General Linguistics
 University of Helsinki
 Hallituskatu 11-13
 SF-00100 Helsinki
 Finland

 Fred Karlsson <karlssonling.helsinki.fi>
 Atro Voutilainen <voutilainenling.helsinki.fi>
 Juha Heikil"a <jheikkilaling.helsinki.fi>
 Arto Anttila <anttilacsli.stanford.edu>
 Krister Linden <klindenling.helsinki.fi>


 Bart Jongejan, CRI A/S, Denmark.

DESCRIPTION OF COMPONENTS

 Main modules of the ENGCG:

 Preprocessor:
 - input normalisation
 - sentence bounds determination

 Morphological analyser:
 - ENGTWOL lexicon: 56000 entries
 (600 multi word units (idioms), 5500 compounds)
 - the analyser program

 Morphological heuristics module:
 - for assigning morphological descriptions to
 words not recognized by the morphological analyser

 English Constraint Grammar:
 - grammar for morphological disambiguation
 -- 1100 constraints
 -- 200 heuristic constraints for
 resolving remainigntambiguities
 - grammar for determining syntactic functions
 -- 250 syntactic constraints for
 syntactic ambiguity resolution

 Constraint Grammar Parser: interprets the input according
 to the morphological and syntactic constraints. Essentialy
 a finite automaton that computes a "shallow dependency structure".

COVERAGE

 Measured against four texts from different domains:

 Lexicon: > 90 %

 Morphological disambiguation:
 - precision: > 95.5 % average (i.e. less then 4.5 %
 of all words remain ambiguous)
 - recall: > 99.7 % average (i.e. less then 0.3%
 of all appropriate morphological readings are
 not found)
 - heuristics resolve aprox. 50% of all remaining
 ambiguities

 Mapping of words to grammatical functions: > 75 % unambiguous
 < 4.5 % errors

IMPLEMENTATION FEATURES

 Common Lisp implementations for Sun Sparcstations 2/20,
 (3-4 words/second on SPARC 2)

 C implementations
 - academic version, written by Bart Jongejan,
 CRI A/S, Denmark (15-25 words/second on SPARC 2)
 - production version, written by Pasi Tapanainen
 (20-25 times faster)

 Source code: not available (?)

LICENCE CONDITIONS

 Sublicence for non-commercial academic research purposes:
 1500 US $. Contact Atro Voutilainen or Fred Karlsson.

 For non-academic use: contact Krister Linden.


(3) English Slot Grammar

TOOL NAME

 English Slot Grammar

DISTRIBUTOR/DEVELOPERS

 IBM T.J. Watson Research Center,
 Michael McCord <mccordwatson.ibm.com>

DESCRIPTION OF COMPONENTS

 Lexicon: 60000 English stems.

 Grammar: ESG. Head oriented grammar (for English Language)
 that delivers parse structures showing both surface structure
 and deep structure in single trees.
 Includes treatment of: coordination, unbound dependencies,
 passive unwinding.

 Syntactic analyzer: bottom up chart parser.

COVERAGE

 (no precise statements available)

IMPLEMENTATION FEATURES

 The development of slot grammars for German, Spanish,
 Danish, Norwegian, Hebrew is in progress.

LICENCE CONDITIONS

 contact Michael McCord


(4) PUNDIT

TOOL NAME

 PUNDIT

DISTRIBUTOR/DEVELOPERS

 Debbie Dahl
 Paramax Systems Corporation
 P.O. Box 517
 Paoli, PA 19301
 USA
 (215) 648-2027
 dahlvfl.paramax.com

DESCRIPTION OF COMPONENTS

 Based on string grammar formalism

 Covers syntax, semantics and pragmatics

 Lexicon of 2000 words (extensible)

COVERAGE

 (no precise statements available)

IMPLEMENTATION FEATURES

 Implemented in Quintos Prolog 3.2.

 Source code availabe.

LICENCE CONDITIONS

 Licence for research purposes: free (restrictions apply)

 Licence for commercial use: to be negotiated.

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue