LINGUIST List 6.1621

Thu Nov 16 1995

Sum: Teaching material on statistical computational ling

Editor for this issue: Ann Dizdar <dizdartam2000.tamu.edu>


Directory

  1. Joakim Nivre, Sum: Teaching material on statistical computational linguistics

Message 1: Sum: Teaching material on statistical computational linguistics

Date: Thu, 16 Nov 1995 13:25:17 Sum: Teaching material on statistical computational linguistics
From: Joakim Nivre <joakimling.gu.se>
Subject: Sum: Teaching material on statistical computational linguistics


Earlier this year I posted a query about teaching material for a
course on statistical NLP, based on Eugene Charniak's book
"Statistical Language Learning". Here is a selective summary of the
information I received, together with some personal reflections on the
material I ended up using in the course. Hopefully, this can be useful
for people who find themselves in the same situation.

First of all, I would like to thank the following people
who responded to the original query:

Francois Aumont
Kirk Belnap
Lars Martin Fosse
Wim de Groote
James Hearne
Marti Hearst
Chris Hogan
Reinhard Koehler
Becky Passonneau
Andrew Salway
Christer Samuelsson
Arian J.C. Verheij
Andy Way

As mentioned above, the main text book I used for the course
(and the only one currently available) was

Charniak, E. (1993) Statistical Language Learning. MIT Press.

Several people directed me to David M. Magerman's review of Charniak's
book (published in Computational Linguistics 21, 103-111). Although
the review is rather negative and I tend to agree with many of the
reviewer's points, I think Charniak's book is useful for an
introductory course on statistical NLP, provided it is supplemented
with background reading in probability theory, statistics and
information theory.

I got many recommendations for textbooks in probability theory,
statistics and information theory. Here is a selection of references
that may be useful:

Probability and Statistics:

DeGroot, M. H. (1986) Probability and Statistics. Second
Edition. Addison-Wesley.
Lindgren, B. W. (1993) Statistical Theory. Fourth Edition.
Chapman and Hall.
Ross, S. (1994) A First Course in Probability. Fourth Edition.
Macmillan.

Information Theory:

Ash, R. (1965) Information Theory. New York: John Wiley.
Cover, T. M. & Thomas, J. A. (1991) Elements of Information
Theory. John Wiley and Sons.

One problem that I struggled with initially was what to give to the
students as background reading, since each of the books listed above
contains much too much material for an introductory course. This
problem was solved beautifully by Brigitte Krenn and Christer
Samuelsson, who let me use their compendium "The Linguist's Guide to
Statistics", written for a course on statistical NLP in
Saarbruecken. Besides a short but thorough introduction to probability
theory, statistics, information theory and Markov models, the
compendium also contains a very useful bibliography of statistical NLP
work. I can strongly recommend this text. People who would like to use
it may contact Christer Samuelsson (christerCoLi.Uni-SB.DE) or
Brigitte Kren (krennCoLi.Uni-SB.DE).

For the different applications of statistical NLP (tagging, parsing,
disambiguation, etc.), I used Charniak's book together with original
articles (most of which can be found in the bibliography of Krenn and
Samuelsson). One useful source of articles is the special issue of
Computational Linguistics on "computational linguistics using large
corpora", originally published in Volume 19 of Computational
Linguistics (1-2) and later published as a book:

Armstrong, S. (ed) (1994) Using Large Corpora. MIT Press.

Somebody actually recommended using this book as the main text instead
of Charniak. For a course at a more advanced level, I think this would
work fine, but you will still need the background material on
probability theory, etc. (if the students do not already have this
background, of course).

Besides literature, I didn't find very much material. Two exceptions
are worth mentioning:

Chris Brew (at the University of Edinburgh) has a WWW page with
teaching materials for statistical NLP, including some simple programs
for calculating bigram frequencies, etc. and notes on some of the
chapters of Charniak's book. The address is:
http://www.cogsci.ed.ac.uk/~chrisbr/charniak.html

Chris Manning (at Carnegie-Mellon) has a WWW page containing an
"Annotated list of resources on statistical natural language
processing and corpus-based computational linguistics". The
address is:
http://kinks.phil.cmu.edu/manning/statnlp.html

Finally, I include the syllabus (plan of lectures + reading list) for
the course I ended up teaching. It was an advanced undergraduate
course for students with a solid background in (traditional)
computational linguistics but little or no background in probability
theory and statistics.

Statistical Models and Methods in Computational Linguistics

Lecture Plan

1. Introduction
2. Elementary Probability Theory
3. Stochastic Variables
4. Statistical Inference
5. Elementary Information Theory
6. Markov Models
7. Probabilistic Language Models
8. Part-of-speech Tagging
9. Probabilistic Grammars
10. Probabilistic Parsing
11. Syntactic Disambiguation
12. Semantics Disambiguation
13. Machine Translation
14. Conclusion

Reading List

Brown, P. Cocke, J., Della Pietra, S., Della Pietra, V. J., Jelinek,
 F.,Lafferty, J. D., Mercer, R. L. & Roossin, P. S. (1990) A Statistical
 Approach to Machine Translation. Computational Linguistics 16, 79-85.
Church, K. W. & Mercer, R. L. (1993) Introduction to the Special Issue
 on Computational Linguistics Using Large Corpora. Computational
 Linguistics 19, 1-24.
Charniak, E. (1993) Statistical Language Learning. MIT Press.
Hindle, D & Rooth, M. (1993) Structural Ambiguity and Lexical Relations.
 Computational Linguistics 19, 103-121.
Krenn, B. & Samuelsson, C. (1995) The Linguist's Guide to Statistics.
 Universitt des Saarlandes: Computerlinguistik.
Merialdo, B. (1994) Tagging English Text with a Probabilistic Model.
 Computational Linguistics 21, 165-201.
Stolcke, A. (1995) An Efficient Probabilistic Context-Free Parsing
 Algorithm that Computes Prefix Probabilities. Computational
 Linguistics 21, 165-201.
Yarowsky, D. (1992) Word Sense Disambiguation Using Statistical Models
 of Roget's Categories Trained on Large Corpora. In Proceedings of
 the 14th International Conference on Computational Linguistics,
 Nantes, France, 454-460.


Joakim Nivre
Department of Linguistics
Gteborg University
S-412 98 Gteborg
Sweden
Email: joakimling.gu.se
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue