Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

Software Details

Title: Natural Language Toolkit, Version 0.7
Submitter: Edward Loper
Description: Version 0.7 of the Natural Language Toolkit is now available for download. The Natural Language Toolkit is a Python package that simplifies the construction of programs that process natural language.
In particular:

- It provides basic tools for manipulating data and performing tasks related to NLP.
- It defines standard interfaces between the different components of an NLP system.
- It provides an infrastructure for building new NLP systems.

The toolkit's primary aim is to serve as a pedagogical tool; but it is also useful as a framework for implementing research-related programs. NLTK runs on most platforms, including Windows, OS X, Linux, and UNIX. For more information, or to download a copy, please visit our web page:
Toolkit Contents

- [Python Modules] implement the basic data types, tools, and interfaces that make up the toolkit.

- [Tutorials] teach students how to use the toolkit, in the contex of performing specific tasks.

- [Exercises and Problem Sets] help students learn more about various aspects of natural language processing.

- [Reference Documentation] provides precise definitions of the behavior of each module, interface, class, method, function, and variable defined by the toolkit.

- [Technical Documentation] explains and justifies the toolkit's design and implementation.

NLTK is an open source project, and we welcome any contributions. We deliberately structured NLTK to facilitate parallel development. If you are interested in contributing to NLTK, or have any ideas for improvements, please talk to us, or send us email a and
NLTK 0.7 includes the following modules:

[token] Basic classes for encoding and processing individual
elements of text, such as words or sentences.
[tree] Classes for representing hierarchical structures over
text, such as syntax trees.
[probability] Classes that encode frequency distributions and
probability distributions.

[tagger] A standard interface to tag each token of a text with
supplementary information, such as its part of speech; and
several implementations of that interface.

[cfg] Basic data types for encoding context free grammars.
[parser] A standard interface to produce trees representing the
structure of texts; and two simple implementations of tha
[parser.chart] A flexible parser implementation that uses a
chart to record hypotheses about syntactic constituents.
[parser.chunk] A standard interface for robust parsers used to
identify non-overlapping linguistic groups (such as base
noun phrases) in unrestricted text; and a regular-expression
based implementation of that interface.
[parser.probabilistic] A standard interface for probabilistic
parsers; and two implementations of that interface.

Text Classification
[classifier] A standard interface for classifying texts into
[classifier.feature] A standard way of encoding the information
used to make classification decisions.
[classifier.naivebayes] A text classifier implementation based
on the Naive Bayes assumption.
[classifier.maxent] An implementation of the maximum entropy
model for text classification; and implementations of the
GIS and IIS algorithms for training the classifier.
[classifier.featureselection] A standard interface for choosing
which features are relevant for a given classification

Finite State Automata
[fsa] Classes for representing finite state automata and regular

Visualization and Interactive Tools
[draw] A Tk-based framework for building graphical tools.
[draw.tree] A graphical representation for hierarchical
[draw.fsa] A graphical representation for finite state automata.
[draw.plot] A tool for graphing arbitrary functions.
[draw.chart] An interactive graphical tool for experimenting
with the chart parser.
[draw.rdparser] An interactive graphical tool for exploring the
recursive descent parser.
[draw.srparser] An interactive graphical tool for learning abou
the shift/reduce parser.
Linguistic Field(s): Computational Linguistics

LL Issue: 13.1699
Date Posted: 15-Jun-2002

Search Again

Back to Software Index