2007 North American Computational Linguistics Olympiad









































































































































































































































Some of the material on this page consists of modified Wikipedia content, provided here under the terms of the GNU Free Documentation License.



Information about Language Technology

Language technology is often called Human Language Technology (HLT) and consists of Computational Linguistics (or CL) and Speech Technology at its core and includes many application oriented aspects of them as well. Language technology is closely connected to Computer Science and Linguistics.


Language Technology Areas

Here are the general language technology areas:


Machine Translation

Machine Translation, sometimes referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. It is considered to be a very challenging problem, in part due to the large variability in the structures of the 6,000 languages of the world. At its basic level, MT performs simple substitution of atomic words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms. Subcategories of MT: Dictionary-based MT, Statistical MT, Example-based MT, Interlingual MT.

Some Groups and Researchers in the Area:



Information Retrieval and Extraction

Information Retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the Internet or World Wide Web or intranets, for text, sound, images or data. There is a common confusion, however, between data retrieval, document retrieval, information retrieval, and text retrieval, and each of these has its own bodies of literature, theory, praxis and technologies. IR is like most nascent fields interdisciplinary, based on computer science, mathematics, library science, information science, cognitive psychology, linguistics, statistics, physics.

Information Extraction (IE) is a type of information retrieval whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents. It is a sub-discipline of language engineering, a branch of computer science.

Some Groups and Researchers in the Area:



Natural Language Processing

Natural Language Processing (NLP) is a subfield of artificial intelligence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into moreformal representations that are easier for computer programs to manipulate.

Some Groups and Researchers in the Area:



Question Answering

Question Answering (QA) is a type of information retrieval. Given a collection of documents (such as the World Wide Web or a local collection) the system should be able to retrieve answers to questions posed in natural language. QA is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval, and it is sometimes regarded as the next step beyond search engines. QA research attempts to deal with a wide range of question types including: fact, list, definition, How, Why, hypothetical, semantically-constrained, and cross-lingual questions. Search collections vary from small local document collections, to internal organization documents, to compiled newswire reports, to the world wide web. Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance), and can be seen as an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies. Open-domain question answering deals with questions about nearly everything, and can only rely on general ontologies and world knowledge. On the other hand, these systems usually have much more data available from which to extract the answer.

Some Groups and Researchers in the Area:



Computational Biology

Computational Biology is an interdisciplinary field that applies the techniques of computer science and applied mathematics to problems inspired by biology. Major fields that use computational biology techniques include:Bioinformatics, which applies algorithms and statistical techniques to biological datasets that typically consist of large numbers of DNA, RNA, or protein sequences. Examples of specific techniques include sequence alignment, which is used for both sequence database searching and for comparison of homologous sequences; gene finding; and prediction of gene expression. (The term computational biology is sometimes used as a synonym for bioinformatics.) Computational genomics, a field within genomics which studies the genomes of cells and organisms by high-throughput genome sequencing that requires extensive post-processing known as genome assembly, and which uses DNA microarray technologies to perform statistical analyses on the genes expressed in individual cell types. Systems biology, which aims to model large-scale biological interaction networks (also known as the interactome), often using differential equations. Protein structure prediction and structural genomics, which attempt to systematically produce accurate structural models for three-dimensional protein structures that have not been solved experimentally. Computational biochemistry and biophysics, which make extensive use of structural modeling and simulation methods such as molecular dynamics and Monte Carlo-inspired Boltzmann sampling methods in an attempt to elucidate the kinetics and thermodynamics of protein functions.

Some Groups and Researchers in the Area:



Speech Recognition

Speech Recognition (in many contexts also known as 'automatic speech recognition', computer speech recognition or erroneously as Voice Recognition) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program. Speech recognition applications that have emerged over the last years include voice dialing (e.g., Call home), call routing (e.g., I would like to make a collect call), simple data entry (e.g., entering a credit card number), and preparation of structured documents (e.g., a radiology report).

Some Groups and Researchers in the Area:



Speech Synthesis

Speech Synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Synthesized speech can also be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output. The quality of a speech synthesizer is judged by its similarity to the human voice, and by its ability to be understood. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1980s.

Some Groups and Researchers in the Area:



Speaker Identification and Verification

Speaker Verification or Voice Authentication is a type of Speaker recognition. It is the problem of verifying a person's identity solely by their voice. It can be used for purposes such as security applications that use a voice print to replace typed passwords and PINs. The voice is then used to authenticate the user. speaker identification is a type of speaker recognition. It is the problem of identifying a person solely by their voice. It can be used for purposes such as police investigations.

Some Groups and Researchers in the Area:



Dialogue System

A Dialogue System is a computer system intended to converse with a human. Dialogue systems have employed text, speech, graphics, haptics, gestures, face configurations, body positions, emotions, and other modes for communicative intent on both the input and output channel.

Some Groups and Researchers in the Area:


Olympiad Locations

Organizing Committee

Pittsburgh area (hosted by Carnegie Mellon University)
contact: Lori Levin, lslcs.cmu.edu
Lori Levin (General Chair), Carnegie Mellon University
 
Philadelphia area (hosted by U. of Pennsylvania)
contact: Mitch Marcus, mitchcis.upenn.edu
Thomas Payne (General Chair), University of Oregon
 
Boston area (hosted by Brandies Univeristy, Cambridge)
contact: James Pustejovsky, boston.olympiadgmail.com
Dragomir R. Radev (Program Chair), University of Michigan
 
Ithaca area (hosted by Cornell University)
contact: Claire Cardie, cardiecs.cornell.edu
William Lewis (Outreach Chair), University of Washington
 
Online participation
contact: Dragomir R. Radev, radevumich.edu
James Pustejovsky (Sponsorship Chair), Brandeis University
Barbara Di Eugenio (Follow-up Chair), University of Illinois at Chicago
Supported by NSF                                             Website Developed by The LINGUIST List                                                          The Association for Computational Linguistics                               Google
                                                                                                                                                                                                                NAACL