LINGUIST List 13.1699

Sat Jun 15 2002

Software: Natural Language Toolkit, Version 0.7

Editor for this issue: James Yuells <jameslinguistlist.org>


Directory

  1. Edward Loper, Natural Language Toolkit, Version 0.7

Message 1: Natural Language Toolkit, Version 0.7

Date: 08 Jun 2002 21:24:48 -0400
From: Edward Loper <edloperseas.upenn.edu>
Subject: Natural Language Toolkit, Version 0.7

Version 0.7 of the Natural Language Toolkit is now available for
download. The Natural Language Toolkit is a Python package that
simplifies the construction of programs that process natural language.
In particular:

 - It provides basic tools for manipulating data and performing tasks
 related to NLP.
 - It defines standard interfaces between the different components of
 an NLP system.
 - It provides an infrastructure for building new NLP systems.

The toolkit's primary aim is to serve as a pedagogical tool; but it is
also useful as a framework for implementing research-related programs.
NLTK runs on most platforms, including Windows, OS X, Linux, and UNIX.
For more information, or to download a copy, please visit our web page:

 http://nltk.sf.net
________________________________________________________________________
Toolkit Contents

 - [Python Modules] implement the basic data types, tools, and
 interfaces that make up the toolkit.

 - [Tutorials] teach students how to use the toolkit, in the context
 of performing specific tasks.

 - [Exercises and Problem Sets] help students learn more about
 various aspects of natural language processing.

 - [Reference Documentation] provides precise definitions of the
 behavior of each module, interface, class, method, function, and
 variable defined by the toolkit.

 - [Technical Documentation] explains and justifies the toolkit's
 design and implementation.
________________________________________________________________________
Contributing

NLTK is an open source project, and we welcome any contributions. We
deliberately structured NLTK to facilitate parallel development. If
you are interested in contributing to NLTK, or have any ideas for
improvements, please talk to us, or send us email at
edlopergradient.cis.upenn.edu and sbunagi.cis.upenn.edu.
________________________________________________________________________
NLTK 0.7 includes the following modules:

 Basics
 [token] Basic classes for encoding and processing individual
 elements of text, such as words or sentences.
 [tree] Classes for representing hierarchical structures over
 text, such as syntax trees.
 [probability] Classes that encode frequency distributions and
 probability distributions.

 Tagging
 [tagger] A standard interface to tag each token of a text with
 supplementary information, such as its part of speech; and
 several implementations of that interface.

 Parsing
 [cfg] Basic data types for encoding context free grammars.
 [parser] A standard interface to produce trees representing the
 structure of texts; and two simple implementations of that
 interface.
 [parser.chart] A flexible parser implementation that uses a
 chart to record hypotheses about syntactic constituents.
 [parser.chunk] A standard interface for robust parsers used to
 identify non-overlapping linguistic groups (such as base
 noun phrases) in unrestricted text; and a regular-expression
 based implementation of that interface.
 [parser.probabilistic] A standard interface for probabilistic
 parsers; and two implementations of that interface.

 Text Classification
 [classifier] A standard interface for classifying texts into
 categories.
 [classifier.feature] A standard way of encoding the information
 used to make classification decisions.
 [classifier.naivebayes] A text classifier implementation based
 on the Naive Bayes assumption.
 [classifier.maxent] An implementation of the maximum entropy
 model for text classification; and implementations of the
 GIS and IIS algorithms for training the classifier.
 [classifier.featureselection] A standard interface for choosing
 which features are relevant for a given classification
 decision.
 
 Finite State Automata
 [fsa] Classes for representing finite state automata and regular
 expressions.

 Visualization and Interactive Tools
 [draw] A Tk-based framework for building graphical tools.
 [draw.tree] A graphical representation for hierarchical
 structures.
 [draw.fsa] A graphical representation for finite state automata.
 [draw.plot] A tool for graphing arbitrary functions.
 [draw.chart] An interactive graphical tool for experimenting
 with the chart parser.
 [draw.rdparser] An interactive graphical tool for exploring the
 recursive descent parser.
 [draw.srparser] An interactive graphical tool for learning about
 the shift/reduce parser.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue