Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more

Donate Now | Visit the Fund Drive Homepage

Amount Raised:


Still Needed:


Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington

Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

Software Details

Title: Natural Language Toolkit: NLTK-Lite version 0.6.5
Submitter: Steven Bird
Description: NLTK, the Natural Language Toolkit, is a suite of Python libraries and
programs for natural language processing. Version 0.6.5 has been
released, and can be downloaded from http://nltk.sourceforge.net/


Software Modules: corpus readers, tokenizers & stemmers, taggers (regexp,
n-gram, backoff, Brill, HMM), parsers (recursive descent, shift-reduce,
chart, probabilistic, ...), clusterers (EM, k-means, ...), probability
distributions, chatbots, demonstrations, ...

Corpora and Corpus Samples: Brown Corpus, CMU Pronunciation Dictionary,
CoNNL-2000, Genesis, Gutenberg, IEER, Presidential Addresses, Names,
PP-Attachment, Senseval 2, TIMIT, Treebank, Words

Documentation: Tutorials and exercises (190pp), API documentation for all
software modules, installation instructions for Windows, Mac, Unix.

ChangeLog for Version 0.6.5 2006-07-09

* Code:
- improvements to shoebox module (Stuart Robinson, Greg Aumann)
- incorporated feature-based parsing into core NLTK-Lite
- corpus reader for Sinica treebank sample
- new stemmer package
* Contrib:
- hole semantics implementation (Peter Wang)
- Incorporating yaml
- new work on feature structures, unification, lambda calculus
- new work on shoebox package (Stuart Robinson, Greg Aumann)
* Corpora:
- Sinica treebank sample
* Tutorials:
- expanded discussion throughout, incl: left-recursion, trees, grammars,
feature-based grammar, agreement, unification, PCFGs,
baseline performance, exercises, improved display of trees

-Steven Bird
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
Cognitive Science

LL Issue: 17.2021
Date Posted: 11-Jul-2006

Search Again

Back to Software Index