Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34228

Still Needed:

$40772

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington

amazon logo
More Info


Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Software



Browse


by Subject Language
by Linguistic Subfield
by Language Family

Help Us


Add a link to Software
Update or report a bad link


Page Index:


Software: Computer Aided Translation

·Computer Assisted Translation: STAR Transit XV is a CAT tool that uses a translation memory to speed up and improve the quality of the translation process. Other products of the STAR family are TermStar and WebTerm. TermStar is an extremely scalable solution that can be used on a single user desktop, on an enterprise database server such as Oracle, Sybase or MS-SQL, or on a Web Server. WebTerm allows to create and update enterprise-related terminology over the corporate intranet or the Web using a Web browser.
·Electronic dictionaries for Windows Polyglossum v. 3.52: More than 200 dictionaries data bases for Polyglossum dictionary program: common lexical dictionaries and sector-specialized dictionaries. Several language pairs. We have also started to issue multilanguage and illustrated dictionaries. Dictionaries on business and economics, polytechnic, special technical dictionaries, dictionaries on medicine, biology, on mathematics, computing engineering etc., etc. Dictionaries for professional interpreters and dictionaries for education. English-Russian-English dictionaries Deutsch-Russisch-Deutsch Français-Russe-Français Suomalais-Venäläinen-Suomalainen Español-Ruso-Español Swedish-Russian-Swedish Explanatory dictionaries of Russian language: Dictionary by Vladimir Dahl (in Old Russian orfography; Dictionary edited by D.Ushakov, Proverbs of the Russian Folk; Famous quotations (in Russian) and more ...
·English Spanish Dictionary: English Spanish Dictionary for translations, synonyms, antonyms, verb conjugations, thesaurus and idioms builder.
·Enhanced Ottoman Turkish Keyboard: Enhanced Ottoman Turkish Keyboard can translate between Latin and Arabic via Internet Explorer. It can process the transcription and adjoin. You can use the program to transfer the text to word processors such as Word for further editing. Additionally, you can copy and paste Latin text into the text field of the program to get instant translation.
·EXTRAKT: Linguistic engine for morphological analysis (lemmatization) , generation, translation (of terms for a cross-lingual search), identification of language. Most European languages are covered.
·Hinditech Dictionary: Online English-Hindi Dictionary of NLP.
·IntelliWebSearch: IntelliWebSearch is a freeware tool designed to save translators and terminologists time when searching the web.
·Kataku - Machine Translation: Kataku is the world's first commercial machine translation (MT) product for Indonesian - English pair developed by ToggleText Pty. Ltd. . It provides a free online machine translation (MT) on http://www.toggletext.com. Note: The free trial interface will only translate the first 300 words of text or web content.
·Keyboard of Modern Turkic Languages: Keyboard of Modern Turkic Languages can translate between Latin and Cyrillic or vice versa, any text (via a browser such as the Internet Explorer. You can use the program to transfer the text to word processors such as Word for further editing. Additionally, you can copy and paste Latin or Cyrillic text into the text field of the program to get instant translation.
·Lingua::Translit: is a tool that converts text between various writing systems. Wherever possible the transliteration is based on national or international standards (e.g. ISO 9, DIN 31634). Otherwise common national transliteration rules are applied. Lingua::Translit is provided as an online service as well as an open source Perl library. The module provides a simple to use object-oriented API and can easily be extended by writing intuitive character mappings in a predefined XML language.
·Linguist's Assistant: Linguist's Assistant (LA) is a multilingual natural language generator based on linguistic universals, typologies, and primitives. LA enables linguists to build lexicons and grammars for a wide variety of languages, particularly minority and endangered languages. LA then uses that information to produce initial draft translations of numerous community development articles in that language. These articles teach people how to prevent the spread of various diseases such as AIDS and Avian Influenza. These texts are intended to improve the quality of people's lives, and enable the speakers of these languages to participate in the larger world. The initial draft translations produced by LA are always easily understandable, grammatically correct, and at approximately a sixth grade reading level. When experienced mother-tongue translators use the drafts generated by LA, their productivity is typically quadrupled without any loss of quality.
·Moses for Mere Mortals: This site offers a set of Bash scripts and Windows executables add-ins that, together, create a basic translation chain prototype able of processing very large corpora. It uses Moses, a widely known statistical machine translation system. The idea is to help build a translation chain for the real world, but it should also enable a quick evaluation of Moses for actual translation work and guide users in their first steps of using Moses. A Help/Short Tutorial (http://moses-for-mere-mortals.googlecode.com/files/Help-Short-Tutorial.doc) and a demonstration corpus (too small for doing justice to the qualitative results that can be achieved with Moses, but able of giving a realistic view of the relative duration of the steps involved) are available. Two Windows add-ins allow the creation of Moses input files from *.TMX translation memories (Extract_TMX_Corpus.exe), as well as the creation of *.TMX files from Moses output files (Moses2TMX.exe). A synergy between machine translation and translation memories is therefore created. The scripts were tested in Ubuntu 9.04 (64-bit version). Documents used for corpora training should be perfectly aligned and saved in UTF-8 character encoding. Documents to be translated should also be in UTF-8 format. One would expect the users of these scripts, perhaps after having tried the provided demonstration corpus, to immediately use and get results with the real corpora they are interested in. Though already tested and used in actual work, this should be considered a work in progress.
·Natural Language Processing software: Natural Language (text) Processing software for parsing, spell-checking, machine translation, thesauri, question answering and text attribution for English, German, French, Italian.
·NTT Machine Translation Group Resources: Japanese-English linguistic term list.
·QuickCount 1.0: Quickcount is a word and line counting for freelance translators, translation and localization agencies, transcription agencies, writers, project managers and other professionals who base their quotations and invoices on document text count (word count, line count, gross line count, character count, page count.). QuickCount provides easy to use client and invoice module allowing to export invoices to PDF and Word documents.
·Similis, the Second Generation Translation Memory: Similis is a full-featured computer-aided translation tool designed for project managers and translators faced with growing demands for both productivity and quality. Similis analyzes previous translations, generates a translation memory (TM) and applies it to all new projects in order to deliver optimal results in two ways: * Translators save time when translating recurrent segments, terms, and word groups. * Translations are more consistent across different documents. Similis is a second-generation translation memory (TM). Much more powerful than first-generation TMs, it includes a linguistic analysis engine, uses chunk technology to break down segments into intelligent terminological groups, and automatically generates specific glossaries. Available in both server and standalone versions, Similis™ meets the needs of large corporations and institutions wanting to better manage both their in-house and outsourced translation projects, as well as those of translation professionals seeking customer loyalty.
·ThetaCircle: ThetaCircle is an easy to use typing and scripting tool. Use it to type, copy and then pipe or paste your script to your favorite word processor. The major use of ThetaCircle is for translation and transcription. Type in any Unicode font on your system, and if you're not satisfied with that, build any typing structure that you want such as using the A key to type a special character, [VP] or a full sentence. ThetaCircle has subversions suited for specific font systems like Hebrew, Arab, Katakana, Hiragana, Thai, Greek, Latin, Cyrillic, and the International and American Phonetic Alphabet (IPA APA).
·xlit: xlit is a program for transliterating text. It allows the user to define a transliteration simply by typing the input strings in one window and the strings to which they are to be mapped in another. It understands Unicode and provides a number of character entry tools. xlit also provides some advanced facilities not found in typical transliteration programs. It is often necessary to restrict transliteration to particular parts of the text. xlit understands a variety of delimiters and if so instructed will transliterate only the regions enclosed by the specified delimiters or only their complements.
·XlitHindi: English to Hindi Transliteration Extension for OpenOffice Writer: XlitHindi is an English to Hindi transliteration extension for OpenOffice Writer. This extension transliterates the words from English to Hindi [ex: converts 'bharat' to 'भारत', 'school' to 'स्कूल', etc.] and offers more Hindi options for each English word on right click. XlitHindi uses Xlit which is a statistical approach based transliteration engine to convert words from English to Indian Languages and back, without loosing the phonetic characteristics. Xlit can be used as an input method and in Machine Translation systems, e-governance applications and other applications that need to enter text in any Indian language and English. Xlit and XlitHindi have been developed by the KBCS Team, C-DAC Mumbai (Erstwhile NCST), India.

Software: Diagram Display

·Augmented Syntax Diagram (ASD) Editor and Parser: Augmented Syntax Diagrams (ASDs) represent grammars as networks of nodes and links. They are equivalent to, but simpler than, ATN grammars. This site contains a description of ASDs, free software written in Java for editing and parsing with ASDs, and example grammars, with semantic augmentations, for parts of English.
·Bracket Notation to Tree Converter: This is a small web application which will convert your labeled bracket notation into a syntax tree. Use of the application is free. You may save the generated images (.png files) to your hard drive for use in other programs. The application is not limited to use for english but the page is in english.
·DiaTech: DiaTech is a web tool for analyzing and visualysing linguistic variation.
·forest: a tree-drawing package for LaTeX: Pack­age 'forest' pro­vides a PGF/TikZ-based mech­a­nism for draw­ing lin­guis­tic (and other kinds of) trees in LaTeX. The package is free (licensed under The LaTeX Project Public Li­cense 1.3). It is available at CTAN (http://www.ctan.org/pkg/forest) and included in TeXLive and MikTeX.
·Linguistic Tree Constructor: LTC is a free tool for drawing linguistic syntactic trees, running on Win32 platforms.
·NooJ: NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several levels of linguistic phenomena: orthography and spelling, lexicons for simple words, multiword units and frozen expressions, inflectional, derivational and productive morphology, local, structural syntax and transformational syntax. For each of these levels, NooJ provides linguists with one or more formal tools specifically designed to facilitate the description of each phenomenon, as well as parsing tools designed to be as computationally efficient as possible. This approach distinguishes NooJ from most computational linguistic tools, which provide a single formalism that should describe everything. As a corpus processing tool, NooJ allows users to apply sophisticated linguistic queries to large corpora in order to build indices and concordances, annotate texts automatically, perform statistical analyses, etc. NooJ is freely available and linguistic modules can already be downloaded for Acadian, Arabic, Armenian, Bulgarian, Catalan, Chinese, Croatian, French, English, German, Hebrew, Greek, Hungarian, Italian, Polish, Portuguese, Spanish and Turkish.
·Simple Syntax Tree Generator: This is a simple, browser based syntax tree generator that uses bracketed notation as input and displays the tree as an image file you can save to your computer. It's designed to be easy to use, draws as you type, and offers basic support to draw movement lines. Unicode characters are supported.
·Syntactica Software: Syntactica is a software application tool designed to let you study natural language structure in a fun, interactive way. It is designed to be used in conjunction with the text 'Grammar as Science'. The program provides a simple interface for: • Creating grammars (consisting of phrase-structure rules and lexicons) • Viewing the structures they assign to natural language expressions • Transforming those structures by syntactic operations such as movement, deletion and copying Syntactica permits many aspects of syntactic theory to be explored. The rule and lexicon windows allow you to assign and control the percolation of syntactic features. The TreeViewer window lets you to perform a variety of formal operations on trees by simply pointing, clicking and using the Transforms panel. Syntactica also allows you to control various constraints on operations, including an elementary version of Subjacency. Originally developed under NeXTSTEP, Syntactica has been ported to JAVA, where it runs under Mac OSX and WIN.
·TiGer Search: Tools for linguistic text exploration; also for Mac OS X
·TikZ-dependency: TikZ-dependency allows you to draw dependency graphs in LaTeX documents with little or no effort. The package has a very easy to learn, high level interface that can be used to draw simple dependency trees, complex non projective graphs, bubble parses, and in general any kind of graph which is based on a sequence of nodes and edges among these. It is based on PGF/TikZ and it can be used either with latex or pdflatex. It comes with a very comprehensive documentation that will get you started in 10 minutes, even without any prior knowledge of TikZ. It also provides a lot of styling facilities, to let you personalize the look and feel of the graphs at your liking.
·TreeBuilder: TreeBuilder is a program offering an easy way to manually build linguistic syntax trees. It supports many useful features (automatical alignment, indices, various link types, etc). The tree can be saved to its own format (*.tree) or to an image (png, jpeg or bitmap).
·TreeForm Syntax Tree Drawing Software: TreeForm Syntax tree drawing software is an open source Linguistic Syntax and Semantics tree drawing editor. Designed for WYSIWYG n-ary tree drawing, reorganizing, saving and printing, this tool greatly speeds up the process of producing Syntax trees. TreeForm also lets you make .pdf (with Acrobat professional or MAC), .jpg and .png trees. This Java program works on MAC, Windows and Linux machines.
·TreeForm Syntax Tree Drawing Software: Syntactic Tree diagrams drawing software.
·Trees 2: Trees 2 is a Macintosh program for displaying and manipulating syntactic trees and derivations. * There is now an update of the program, Trees 3, which runs on Windows.*
·TWSI Sense Substituter Software: TWSI is software which produces lexical substitutions in context for over 1000 frequent nouns. The software processes English text. This functionality is realized by a supervised word sense disambiguation system, which is trained by sense-labeled occurrences of target words. A classification model is trained for each word, and used to decide which sense an unseen occurrence most likely belongs to. Associated with senses are lists of substitutions, which are injected into the text using inline annotation.
·Txtkit: A visual text mining tool for Mac OS X

Software: Fieldwork

·Alchemist: The original purpose of Alchemist is to allow you to read in raw text files and create morphological gold-standards in XML format. Using Alchemist, you can identify morphemes, along with a number of important characteristics of the morphemes, such as whether they are roots or affixes, the degree of analyst certainty, and allomorphs of the morpheme. Alchemist is also a good general tool for sorting and filtering lists of words, because it allows the user to easily use regular expressions applied to words.
·Audiamus: Audimaus builds a corpus of linked text and media. It is a cross-platform tool that allows presentation of textual material linked to unsegmented media files, using quicktime to instantiate links. It was developed as a means of working interactively with field recordings and of presenting texts and example sentences as playable media with a dissertation.
·EXMARaLDA: A system and toolset for creating, managing and analysing corpora of transcriptions of spoken language. Consists of an editor for transcriptions in musical score notation, a corpus manager and a search tool. All file formats are XML based which maximizes exchangeability and archiveability. Many import and export functionalities (Praat, ELAN, AGTK, RTF, HTML, SVG etc.).
·Kura: Kura is a complete system for the handling of linguistic data, especially fieldwork data from small-corpus languages. It allows users to enter texts in any language, analyze those texts and bring the analyzed linguistic facts into relation with each other. Kura includes both a desktop application for easy handling of interlinear texts, lexica and other linguistic data, and a special-purpose webserver for the online presentation of the analyzed data.
·Linguist's Assistant: Linguist's Assistant (LA) is a multilingual natural language generator based on linguistic universals, typologies, and primitives. LA enables linguists to build lexicons and grammars for a wide variety of languages, particularly minority and endangered languages. LA then uses that information to produce initial draft translations of numerous community development articles in that language. These articles teach people how to prevent the spread of various diseases such as AIDS and Avian Influenza. These texts are intended to improve the quality of people's lives, and enable the speakers of these languages to participate in the larger world. The initial draft translations produced by LA are always easily understandable, grammatically correct, and at approximately a sixth grade reading level. When experienced mother-tongue translators use the drafts generated by LA, their productivity is typically quadrupled without any loss of quality.
·Pacx: Platform for Annotated Corpora in XML . Integrated tool for corpus linguistics built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, for querying, for managing version controlled data, and for building a shippable corpus.
·SIL FieldWorks Language Explorer (FLEx): FLEx is a data management and analysis tool for linguists and lexicographers. It is designed for managing and editing lexical data, and for interlinearizing texts. Other tools in the program include concordance, discourse chart, morphological grammar sketch, bulk editing.
·Toney: Software for Phonetic Classification: Toney is a free software tool that supports classification of spoken forms into phonetic categories. Use it to manually sort linguistic forms into clusters, then listen to all items in a given cluster in order to hear any outliers. It is ideal for use in early elicitation tasks, in which the linguistically salient phonetic categories are not yet clearly established. Toney uses specially formatted Praat TextGrid files and corresponding audio files. Unix, Mac, and Windows distributions are available, along with sample datasets.
·Toolbox: Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.
·WeSay: WeSay helps non-linguists build a dictionary in their own language. It has various ways to help native speakers to think of words in their language and enter some basic data about them (no backslash codes, just forms to fill in). The program is customizable and task-oriented, giving the advisor the ability to turn on/off tasks as needed and as the user receives training for those tasks. WeSay uses a standard xml format, so data can be exchanged with linguist-oriented tools like FieldWorks.

Software: Historical Reconstruction

·ALingua: A Java application that simulates the evolution of a two-language system in a finite population. In particular, ALingua allows one to examine the spatial dynamics of such a system given a set of initial conditions: a distribution of agents, a network defining connections between them, and a language learning algorithm with associated parameter settings.
·LingPy: LingPy is a suite of open-source Python modules for sequence comparison, distance analyses, data operations and visualization methods in quantitative historical linguistics. The main idea of LingPy is to provide a software package which, on the one hand, integrates different methods for data analysis in quantitative historical linguistics within a single framework, and, on the other hand, serves as an interface for the preparation and analysis of linguistic data using biological software packages.
·Phono: Version 4.1: Phono is a software tool for developing and testing models of regular historical sound change. If you wish to test a sound-change model for which you have an ordered set of rules and a set of ancestor words, or if you teach about the operation of regular sound change, Phono may be useful to you.
·Wordcorr: A tool to assist the linguist in comparative phonology. Data entered by keyboard (full IPA) or imported. The linguist decides what forms are comparable, annotates them as such and aligns their segments, then tabulates the resulting correspondence sets into a results structure organized by presumed protosegment and environment. The entire results structure can be reorganized as needed to express an analysis.

Software: Lexicons

·Alchemist: The original purpose of Alchemist is to allow you to read in raw text files and create morphological gold-standards in XML format. Using Alchemist, you can identify morphemes, along with a number of important characteristics of the morphemes, such as whether they are roots or affixes, the degree of analyst certainty, and allomorphs of the morpheme. Alchemist is also a good general tool for sorting and filtering lists of words, because it allows the user to easily use regular expressions applied to words.
·An English Dictionary and Thesaurus in Flash: A comprehensive lexical reference system with more than 145,000 related terms and 110,000 meanings. A lookup is followed by a trail of related terms. This software supports more synonyms, hypernyms, hyponyms, antonyms, related verbs and more. Written in Flash, it makes a journey through the English language a rich multimedia experience.
·CLaRK - an XML-based System for Corpora Development: CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
·Classics Technology Center: Etymological Dictionary of Greek & Latin Roots of English words.
·Hinditech Dictionary: Online English-Hindi Dictionary of NLP.
·IBM LanguageWare: LanguageWare is a software component that provides linguistic processing for a variety of products and solutions in more than 20 languages. It comprises a Java library with a set of language resources. The library encodes the language models, and the resources (dictionaries) encode the lexical entries for each language and contain language-specific processing logic, such as logic for handling decomposition, spelling correction, morphology, hyphenation, language identification, etc.
·LexChecker: We would like to announce the release of LexChecker, an English web-based corpus query service developed at National Central University in Taiwan. LexChecker takes as query input an English word and provides as output a list of chunks or multiword strings in which that word is conventionally used. The chunks in turn are each linked to example sentences from BNC. The listed chunks are not only word strings but can also contain slots indicating limited substitutability within a part of speech. Thus the results show not only strings but patterns of the target word’s use. To try the service and to see further description, please visit http://www.lexchecker.org . David Wible Nai-Lung Tsao National Central University Taiwan
·Lexique: Lexique 3 available at www.lexique.org is an open-source database for French. Including Lexique 2 and 3, it describes 55 000 lexical roots, and more than 135 000 lexical entries.
·Lexique Pro: Lexique Pro is an interactive lexicon viewer and editor, with hyperlinks between entries, category views, dictionary reversal, search, and export tools. It's designed to display your data in a user-friendly format so you can distribute it to others.
·Linguist's Assistant: Linguist's Assistant (LA) is a multilingual natural language generator based on linguistic universals, typologies, and primitives. LA enables linguists to build lexicons and grammars for a wide variety of languages, particularly minority and endangered languages. LA then uses that information to produce initial draft translations of numerous community development articles in that language. These articles teach people how to prevent the spread of various diseases such as AIDS and Avian Influenza. These texts are intended to improve the quality of people's lives, and enable the speakers of these languages to participate in the larger world. The initial draft translations produced by LA are always easily understandable, grammatically correct, and at approximately a sixth grade reading level. When experienced mother-tongue translators use the drafts generated by LA, their productivity is typically quadrupled without any loss of quality.
·LIWC - Linguistic Inquiry and Word Count: LIWC calculates the percentage of words within each file along 72+ dimensions. Categories include negative emotions (including anger, anxiety, sadness), positive emotions, cognitive processing, standard linguistic dimensions (pronouns, prepositions, articles), and common content categories (death, sex, occupation, etc). This is a sound program from a psychometric perspective -- both in the creation of categories and the validation of the dictionaries. Dictionaries in English, Spanish, German, Dutch, Italian, and Norwegian are available; partial dictionaries in Korean, Hungarian, and French,
·Marcion - coptic software: Coptic English & Coptic Czech dictionary related to Crum's coptic dictionary, written in C++, with embedded MySql server and Qt GUI. Contains coptic texts, grammars, greek texts, LSJ greek-english lexicon and others.
·Matapuna Dictionary Writing System: The Matapuna Dictionary Writing System is a Free, easy to use, web-based, multiuser, multilingual lexicography software system. It assists with many tasks, including dictionary creation and editing, data management, team collaboration, error checking, corpus, publishing, and progress monitoring.
·MorDebe: MorDebe, is a free, large-scale, lexicographically controlled lexicon for European Portuguese, concentrated around inflectional morphology. The lexicon provides inflectional paradigms, word-class, and orthographic information for over 125.000 Portuguese words - with a total of around 1,5 million word-forms. On top of this, the database provides information about derivational morphology and orthographic variation for a large amount of lexical items. The database also contains words of other national variants of Portuguese (Brazil, Angola, Cabo Verde, etc.) - all words belonging to these variants are explicitly marked as such.
·msort: msort is a sophisticated sort utility. It differs from typical sort utilities in providing greater flexibility in parsing the input into records and identifying key fields and greater control over the sort order. Records need not be single lines of text but may be delimited in a number of ways. Key fields may be selected by position in the record , by character ranges, or by matching a regular expression to a tag. For each key an arbitrary sort order may be specified together with multigraphs, exclusions, and regular expression substitutions. In addition to the usual lexicographic and numerical orderings, msort supports sorting by date, time, and string length. Lexicographic keys may be reversed, allowing the construction of reverse dictionaries. Any or all keys may be optional. For optional keys, the user may specify how records missing the key field should compare to records in which the key field is present. Msort fully supports Unicode.
·Nonobox 0.2: A Python script for performing regex searches on databases written and interlinearized with SIL Toolbox. The script is licensed with the GPL and is offered as-is.
·NooJ: NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several levels of linguistic phenomena: orthography and spelling, lexicons for simple words, multiword units and frozen expressions, inflectional, derivational and productive morphology, local, structural syntax and transformational syntax. For each of these levels, NooJ provides linguists with one or more formal tools specifically designed to facilitate the description of each phenomenon, as well as parsing tools designed to be as computationally efficient as possible. This approach distinguishes NooJ from most computational linguistic tools, which provide a single formalism that should describe everything. As a corpus processing tool, NooJ allows users to apply sophisticated linguistic queries to large corpora in order to build indices and concordances, annotate texts automatically, perform statistical analyses, etc. NooJ is freely available and linguistic modules can already be downloaded for Acadian, Arabic, Armenian, Bulgarian, Catalan, Chinese, Croatian, French, English, German, Hebrew, Greek, Hungarian, Italian, Polish, Portuguese, Spanish and Turkish.
·Online English Spanish Dictionary: Free online English Spanish Dictionary with translations, synonyms, definitions and usage examples.
·Online Terminology Management Software: Create and manage your terminology in multiple languages. This terminology tool is conceptual — terms in different languages are connected with one concept. You can access your termbank from any computer connected to internet and you are able to publicise your work to the rest of the world with a mouse click.
·TAMS: Text Analysis Markup System; for Linux and Mac OS X
·Texai Lexicon: The Texai lexicon is a merging of WordNet 2.1, the CMU Pronouncing Dictionary, Wiktionary, and the OpenCyc lexicon. The format is RDF, N3 or TriG. Included are entries for lemmas, word forms, word senses, sample phrases and ARPABET pronunciations. A documentation file is available as a separate download.Only the TriG version contains context.
·TshwaneLex Lexicography Software: TshwaneLex is a professional software application for the compilation of monolingual, bilingual or semi-bilingual dictionaries. TshwaneLex contains various innovative features designed to optimise the process of producing dictionaries, and to improve consistency and quality of the final dictionary product. TshwaneLex supports Unicode throughout, allowing it to handle virtually all of the world's languages, and includes features such as immediate article preview, customisable fields, automatic cross-reference tracking, automated lemma reversal, online and electronic dictionary modules, export to MS Word format, and teamwork (network) support.
·WordNet: A WordNet client for Mac OS X
·WordSmith Tools: A suite of pc software for lexical analysis of corpora in a very wide variety of languages. Offers oncordancing, wordlisting, key words analysis and a number of other utilities. WordSmith 3.0 (OUP, 1999) handles Windows 3.1 and better and is restricted to Ascii/Ansi text; WS 4.0 (2002) requires Windows 98B or better and handles Unicode as well as Ascii/Ansi text. Version 4.0 was issued in 2004. This is a complete new edition with many limitations removed and numerous additional features, such as sound concordancing, use of Unicode, tools for obtaining text from the Internet, etc.

Software: Morphological Analysis

·AGTK: An annotation graph toolkit. Also available for Mac OS X.
·Alchemist: The original purpose of Alchemist is to allow you to read in raw text files and create morphological gold-standards in XML format. Using Alchemist, you can identify morphemes, along with a number of important characteristics of the morphemes, such as whether they are roots or affixes, the degree of analyst certainty, and allomorphs of the morpheme. Alchemist is also a good general tool for sorting and filtering lists of words, because it allows the user to easily use regular expressions applied to words.
·Child Phonology Analyzer: This user-friendly and easy-to-use tool provides phonological and lexical analyses of child speech (but can be adjusted for other types of corpora). It is intended to be used with corpora stored in Microsoft Excel files. The tool offers a detailed phonological analysis, allowing you to count instances of different segments, articulatory features, syllablic structures and strings (words and parts of words) within a given age range. It also offers an account of lexical development, portraying stages of development (by cumulated attempted target words).
·Emdros text database engine for analyzed or annotated text: Emdros is an Open Source text database engine specializing in linguistic analyses of text. Emdros comes with a powerful query language for asking linguistically relevant questions of the data.
·EXTRAKT: Linguistic engine for morphological analysis (lemmatization) , generation, translation (of terms for a cross-lingual search), identification of language. Most European languages are covered.
·Helpful add-in to MS Word : repetition counter and approximate matching search tools: Fore Words is plugin (Add-in) to Microsoft Word , providing some helpful tools for text analysis. Currently add-in contains two items : Repetyler and K-Diff Search. Repetyler calculates numbers of all repetitions in text (words or phrases). This program can help to improve (or just examine) the writing style in business documentation, literature text, correspondence, etc. Excessively frequent constructions and so called words-parasites can be invisible at a first glance, but drastically affect reader's impression in a wrong way. On the other hand, repetition analysis can help you to build the true portrait of person or find implicit messages in formal language. Web-masters can find Repetyler useful when analyzing words density and choosing keywords for search engines. Professional version , Fore Words Pro , provides the additional possibility to count repetitions of word parts . This way you can find repeatedly used words in all their forms (particularly, with different suffixes). The length of word part being searched is configurable. Yet another configuration parameter is position of word part : this can be beginning of the word (prefix) or middle part. K-Diff Search is search by approximate matching.
·HFST - Helsinki Finite-State Transducer Technology: The Helsinki Finite-State Transducer software is intended for the implementation of morphological analysers and other tools which are based on weighted and unweigted finite-state transducer technology. This work is licensed under a GNU Lesser General Public License v3.0. The feasibility of the HFST toolkit is demonstrated by a full-fledged open source implementation of a Finnish morphological analyzer as well as analyzers and generators for a number of other languages of varying morphological complexity, e.g. English, French, German, Italian, Nothern Sámi, Swedish, Turkish, etc. Many more languages are also available as spellers and hyphenators.
·IBM LanguageWare: LanguageWare is a software component that provides linguistic processing for a variety of products and solutions in more than 20 languages. It comprises a Java library with a set of language resources. The library encodes the language models, and the resources (dictionaries) encode the lexical entries for each language and contain language-specific processing logic, such as logic for handling decomposition, spelling correction, morphology, hyphenation, language identification, etc.
·Linguistica: Linguistica is an ongoing research project developing software for the unsupervised learning of natural language morphology. It takes an untagged text corpus as its input, and attempts to determine the stems, affixes, and morphological structure of the words with no prior knowledge of the language.
·minpair: Generates a complete list of minimal pairs from a wordlist. Minpair accepts input in Unicode and optionally finds pairs differing in a single transposition or insertion/deletion. Multigraphs (sequences of characters treated as a single segment) may be defined. The basic program is a command-line program. It may be driven by an optional GUI.
·Morfix-Meister: Ein Werkzeug zum Erkennen von Wortstrukturen durch das Hantieren mit häufigen Wortbausteinen gruppiert nach Rechtschreibmustern. A German dictionary-like tool, sorted by morphemes.
·Onoma: New and existing Spanish verbs conjugator: Onoma is a conjugator which was recently released by the company Molino de Ideas. This conjugator can analyze not only the existing verbs but also new verbs in Spanish. This software: 1- conjugates from the infinitive form, 2- analyzes any verb form, 3- allows the user to create a new verb and conjugates it, 4- provides the user a manual to learn the Spanish verb paradigm in an easy ruled-based method. It also supplies the guidelines to know whenever a Spanish verb is regular or irregular. The manual is available at: http://www.molinodeideas.es/descargas/el_verbo_espaniol.pdf Onoma is available in: Arabic, Basque, Catalan, Galician, Chinese, English, Esperanto, French, German, Italian, Japanese, Korean, Polish, Portuguese, Romanian, Russian and Spanish.
·TAMS: Text Analysis Markup System; for Linux and Mac OS X
·TiGer Search: Tools for linguistic text exploration; also for Mac OS X
·XLE: The Xerox Linguistics Environment is a tool for parsing and generating Lexical Functional Grammars. The software runs on Linux, Unix, Solaris and Mac OS X.

Software: Natural Language Processing

·AGFL Grammar Work Lab: A collection of software systems for Natural Language Processing, based on the AGFL-formalism (Affix Grammars over Finite Lattices).
·AGTK: An annotation graph toolkit. Also available for Mac OS X.
·CardWorld1: CardWorld1is a working model of dialog about cards and piles of cards on a table top. It uses a mix of direct-manipulation, English language, and pointing to manipulate the cards and piles. Ambiguities of pointing, and changing configurations of piles due to movement of cards allow for interesting modeling of deixis and anaphora. Syntax, semantics, and pragmatics of the model are all fairly simple but not trivial. Many extensions are possible, as suggested in its documentation. The model is free to use for research and teaching. The model is written in open-source Java can be run as a Java Web Start applet with a Java-enabled browser from the web site.
·CardWorld1a an example of understanding: This post is to update the web start page link to CardWorld1a, the latest version of a working model of dialog about cards and piles of cards on a table top. It uses a mix of direct-manipulation, English language, and pointing to manipulate the cards and card collections. Ambiguities of pointing, and changing configurations of card collections due to movement of cards allow for interesting modeling of deixis and anaphora. Syntax, semantics, and pragmatics of the model are all fairly simple but not trivial. Many extensions are possible, as suggested in the documentation at http://www.yorku.ca/jmason/CardWorld1.html . The model is free to use for research and teaching. The model is written in open-source Java can be run as a Java Web Start applet with a Java-enabled browser from this web site: http://nlu.asd-networks.com/home/cardworld/cardworld1a/
·CLaRK - an XML-based System for Corpora Development: CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
·Computational Linguistics in Poland: This web page contains links to various sites devoted to Computational Linguistics (CL) / Natural Language Processing (NLP) / Linguistic Engineering (LE) in Poland, including sites containing Polish resources such as corpora, lexica, etc.
·DKPro WSD: DKPro WSD is a modular, extensible Java framework for word sense disambiguation. It is based on Apache UIMA, an industry standard for text processing.
·Fink Text packages for Mac OS X: Text-related Unix packages for Mac OS X. Another source for native Mac OS X software is osx.hyperjeff.net/Apps.
·GATE: General Architecture for Text Engineering. A domain-specific software architecure and development environment that supports researchers in Natural Language Processing and Computational Linguistics and developers who are producing and delivering Language Engineering systems.
·HFST - Helsinki Finite-State Transducer Technology: The Helsinki Finite-State Transducer software is intended for the implementation of morphological analysers and other tools which are based on weighted and unweigted finite-state transducer technology. This work is licensed under a GNU Lesser General Public License v3.0. The feasibility of the HFST toolkit is demonstrated by a full-fledged open source implementation of a Finnish morphological analyzer as well as analyzers and generators for a number of other languages of varying morphological complexity, e.g. English, French, German, Italian, Nothern Sámi, Swedish, Turkish, etc. Many more languages are also available as spellers and hyphenators.
·Hinditech Dictionary: Online English-Hindi Dictionary of NLP.
·Inputlog: Inputlog is a freeware research tool that enables researchers to log writing processes (Windows) and analyse them. - Inputlog records the data of a writing session in Microsoft® Word; - Inputlog generates datafiles for statistical, text, pause, mode and revision analyses; - Inputlog plays the recorded session at different speeds.
·Intellexer - Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing: Linguistic platform Intellexer allows developing Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing systems and other intelligent software.
·JavaRAP: JavaRAP is a standalone, publicly-available implementation of the Resolution of Anaphora Procedure (RAP) given by Lappin and Leass (1994). The RAP algorithm resolves third person pronouns, lexical anaphors, and identifies pleonastic pronouns. The implementation uses the standard, publicly available Charniak (2000) parser as input, and generates a list of anaphora-antecedent pairs as output. Alternately, an in-place substitution of the anaphors with their antecedents can be produced. It could be used as a reference to benchmark other anaphora resolution algorithms or systems; or to provide anaphora resolution function as needed by other NLP applications.
·LanguageTool: An Open Source Language Checker: LanguageTool is an Open Source language checker for English, French, German, Polish, Dutch, Romanian, and other languages. Its is a rule-based language checker that will find errors for which a rule is defined in its XML configuration files. Rules for more complicated errors can be written in Java.
·Leopar 1.0.0 release: LEOPAR is an Open Source natural language parser. It is based on Interaction Grammars (http://leopar.loria.fr/doku.php?id=ig:formalism) and produces deep syntactic structures for grammatical sentences. An online demo (with resources for the French language) can be found here: http://leopar.loria.fr/demo It can produce both dependency structures and phrase structures. A set of French linguistic resources (grammar and lexicon) is available for LEOPAR. LEOPAR can be installed on Linux and MacOS.
·Linguist's Assistant: Linguist's Assistant (LA) is a multilingual natural language generator based on linguistic universals, typologies, and primitives. LA enables linguists to build lexicons and grammars for a wide variety of languages, particularly minority and endangered languages. LA then uses that information to produce initial draft translations of numerous community development articles in that language. These articles teach people how to prevent the spread of various diseases such as AIDS and Avian Influenza. These texts are intended to improve the quality of people's lives, and enable the speakers of these languages to participate in the larger world. The initial draft translations produced by LA are always easily understandable, grammatically correct, and at approximately a sixth grade reading level. When experienced mother-tongue translators use the drafts generated by LA, their productivity is typically quadrupled without any loss of quality.
·Mind-1.1: An original, linguistic theory of mind implemented in JavaScript for tutorial purposes and in Forth for robots. Clicking on the Mind-1.1 link causes the artificial Mind to travel across the Internet and come alive in your Microsoft Internet Explorer Web browser. Options include a default tutorial mode, a printed-transcript mode for recording natural-language-generation (NLG) sessions, and a troubleshoot mode for debugging any malfunction of the still evolving software. The documentation of the thirty-four (34) AI Mind-modules has been published in November of 2002 as the 34 chapters of the artificial intelligence textbook for computer science students,
·Natural Language Processing software: Natural Language (text) Processing software for parsing, spell-checking, machine translation, thesauri, question answering and text attribution for English, German, French, Italian.
·Natural Language Software Registry: A concise summary of the capabilities and sources of a large amount of natural language processing (NLP) software available to the NLP community.
·Natural Language Toolkit: NLTK, the Natural Language Toolkit, is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is ideally suited to students who are learning NLP (natural language processing) or conducting research in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems. NLTK is free software, written in Python, and released under an open source license.
·openNLP: The Open Natural Language Processing website with many software packages that also run on Mac OS X.
·Pertinence Summarizer: PS is a multilingual and multidomain text summarization software, which can summarize a wide variety of file formats to a length specified by the user. Languages supported: French, English, Spanish, German, Portuguese, Italian, Japanese, Chinese and Korean http://www.pertinence.net/index_en.htm
·SFST Tools: The Stuttgart Finite State Transducer (SFST) tools are an efficient and easy-to-use platform for the implementation of morphological analysers and other applications which are based on finite-state technology. The implementation of the SFST tools is based on a C++ library. The SFST tools are distributed under the GNU Public License.
·Synview - A syntax tree visualization tool: SynView is a new syntax tree visualization tool called which developed by Christian Behrenberg (http://www.christian-behrenberg.de). It can be downloaded at the following address: http://www.christian-behrenberg.de/work/SynView.html SynView is a syntax tree visualization tool that can load sentence structures in LaTex compatible bracketing format (as used e.g. by Qtree) from a text file and will then render the set of syntax trees in a nice-looking and smooth-to-navigate fashion. One of the distinguishing features of this tool is the ability to zoom in on any constituent in the tree with just one click of the mouse. Moreover, one can load a whole set of trees at once, e.g. all possible analyses of an ambiguous sentence or multiple steps in a syntactic derivation, and quickly browse through these trees using the arrow keys. These features should make it ideally suited for use in (computational) linguistics courses. SynView originally was the visualization part of a parser developed as a students project in a computational linguistics course at Ruhr-Universität Bochum, Germany. The challenging part was to generate during runtime a nice rendering of the tree structures determined by the parser. The author used a 3D engine to create an interactive and immersive visualization that is both appealing and convenient. Because of the 3D engine used, it is so far only available for Microsoft Windows. Jan Strunk strunklinguistics.rub.de Sprachwissenschaftliches Institut Ruhr-Universität Bochum Germany
·Tesla (Text Engineering Software Laboratory): Tesla is a client-server-based, virtual research environment for text engineering - a framework to create experiments in corpus linguistics, and to develop new algorithms for natural language processing. It is being developed at the Department of Computational Linguistics, University of Cologne, Germany, and licenced under the Eclipse Public Licence (EPL). Tesla was implemented in Java (with an Eclipse-based Client and IDE) and is available for Windows, Linux, and Mac OS X.
·TiGer Search: Tools for linguistic text exploration; also for Mac OS X
·UAM CorpusTool: The UAM CorpusTool is a text annotation tool, allowing annotation of a plain text corpus (collections of text files) at multiple linguistic levels. The annotation scheme at each level is provided by the user in terms of a hierarchical tree of features (allowing cross-classification). The tool allows complex search of the corpus, including concordancing. Another interface allows you to produce statistical analyses of the corpus (descriptive, comparative). Windows and Macintosh are supported. On Windows, full unicode support.
·VisualText: Integrated development environment for NLP. Builds multi-pass, multi-strategy analyzers. The NLP++ programming language and Conceptual Grammar hierarchical knowledge base support grammars, patterns, lexicons, ontologies, and heuristics. Academic licensing available.
·What's Wrong With My NLP?: A visualizer and graphical diff for NLP problems. Displays syntactic and semantic trees and
·XLE: The Xerox Linguistics Environment is a tool for parsing and generating Lexical Functional Grammars. The software runs on Linux, Unix, Solaris and Mac OS X.

Software: Other Software Tools

·An online database of phnological representation for Mandarin Chinese monosyllables: A web-based database is developed to provide psycholinguists with a large-scale phonological representation system for all Mandarin Chinese monosyllables. The construction of the system is based on the slot-based phonological pattern generator (PatPho), with an adequate consideration of the language-specific features of the Chinese phonology. Users can retrieve the relevant phonological representations through an interactive query system on the web. The query outcomes can be saved in a number of formats, such as Excel spreadsheets, for further analyses. This representation system can be used for a variety of purposes, in particular, connectionist language modeling, and more generally, the study of Chinese phonology.
·Analysis: A program which allows several types of text analysis. For Windows or Unix.
·Bibliographix: A reference manager. Available in free (basic) and pro (advanced) versions. The latter adds the option of importing references from diverse OPACs including Library of Congress, GBV and some German libraries (it's a German software). It also features a number of export formats (BibTeX, Reference Manager, Endnote) and has a direct module for setting references in Word.
·Bigram Statistics Package: This is an easy to use suite of Perl tools for counting and analyzing bigrams in text.
·Bitstream: Fonts,international language typefaces,font CDs,and custom type design services.
·Bust A Vowel: Educational software that helps to learn the whole set of human vowels, using high quality sound and IPA characters. It's perfect for students of phonology and phonetics. This software is 100% freeware with no limitations.
·CINTIL Concordancer and Corpus: CINTIL Online Concordancer is now available at: http://cintil.ul.pt This is an online concordancing service that supports the research usage of the CINTIL Corpus. CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese, developed at the University of Lisbon. At present it is composed of 1 Million annotated tokens, manually verified by linguistic experts. The annotation comprises information on part-of-speech, on lemma and inflection of tokens from open classes, on multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and on multi-word proper names (for named entity recognition). Feedback is very welcome, to cintil@ di.fc.ul.pt ______________________________//_______________________________________ O Concordanciador CINTIL (Corpus Internacional do Português) é um serviço online gratuito de extracção de concordâncias para a pesquisa linguística, que já se encontra disponível em http://cintil.ul.pt/pt/. O Corpus CINTIL é um corpus do português que contém cerca de 1 milhão de palavras anotadas com informação linguística (classe morfo-sintáctica, lema e flexão das classes abertas, locuções pertencentes à classe dos advérbios e às classes fechadas e nomes próprios multi-palavra (para o reconhecimento de entidades nomeadas)), manualmente verificada por especialistas. Este corpus está a ser desenvolvido e mantido na Universidade de Lisboa pelo grupo REPORT do Centro de Linguística da Universidade de Lisboa em cooperação com o Grupo NLX-Natural Language and Speech do Departamento de Informática da Faculdade de Ciências da Universidade de Lisboa. Quaisquer comentários resultantes da experiência de utilização são muito bem-vindos e podem ser enviados para cintil@di.fc.ul.pt.
·CSLI Verb Ontology: A free, public domain, Verb Ontology. The resources are provided in the Prolog programming language as Prolog predicates and so should be easy to export to any other formats with a simple prolog rule.
·E-meld School of Best Practices Tool Room: The Tool Room provides information about hardware and software tools available for linguists, many of which will help you to conform to Best Practice. Tools are divided into the categories of Software and Hardware. The software area houses a database of software recommended by linguists. In this section you can browse for software based on its function (Concordancers, Lexicon Management, etc), read the comments of other linguists, and share your own opinions. The hardware area contains a database of information on many types of hardware that is useful in the field, from digital recorders to solar panels.
·Ergane: A freeware multilingual dictionary programme (Win 3.1) using Esperanto as auxiliary language. Vocabularies for more than 40 languages available.
·Ethnologue - Languages of the world: A catalogue of more than 6,700 languages spoken in 228 countries.
·Fluid Construction Grammar: A fully operational grammar formalism and implementation for representing, learning and applying lexical and grammatical inventories. FCG opens many new research directions for linguists, especially those interested in cognitive, computational and evolutionary linguistics, and researchers in Artificial Intelligence.
·Gokturkish Keyboard: Gokturkish Keyboard can transliterate between Latin and Gokturkish via Internet Explorer. You can use the program to transfer the text to word processors such as Word for further editing. Additionally, you can copy and paste Latin text into the text field of the program to get instant transliteration.
·Helpful add-in to MS Word : repetition counter and approximate matching search tools: Fore Words is plugin (Add-in) to Microsoft Word , providing some helpful tools for text analysis. Currently add-in contains two items : Repetyler and K-Diff Search. Repetyler calculates numbers of all repetitions in text (words or phrases). This program can help to improve (or just examine) the writing style in business documentation, literature text, correspondence, etc. Excessively frequent constructions and so called words-parasites can be invisible at a first glance, but drastically affect reader's impression in a wrong way. On the other hand, repetition analysis can help you to build the true portrait of person or find implicit messages in formal language. Web-masters can find Repetyler useful when analyzing words density and choosing keywords for search engines. Professional version , Fore Words Pro , provides the additional possibility to count repetitions of word parts . This way you can find repeatedly used words in all their forms (particularly, with different suffixes). The length of word part being searched is configurable. Yet another configuration parameter is position of word part : this can be beginning of the word (prefix) or middle part. K-Diff Search is search by approximate matching.
·IBM LanguageWare: LanguageWare is a software component that provides linguistic processing for a variety of products and solutions in more than 20 languages. It comprises a Java library with a set of language resources. The library encodes the language models, and the resources (dictionaries) encode the lexical entries for each language and contain language-specific processing logic, such as logic for handling decomposition, spelling correction, morphology, hyphenation, language identification, etc.
·IBM LanguageWare Miner for Multidimensional Socio-Semantic Networks: IBM LanguageWare's library for lexical analysis, disambiguation and ontology/multi-dimensional network based semantic analysis and information mining.
·ISIS: Indian Scripts Input System (ISIS) is a set of easy-to-use, mnemonic software keyboards for Indian scripts. ISIS is Unicode-compliant and covers almost all major Indian scripts with a single keyboard layout.
·KickKeys: KickKeys is a software tool for linguists. It allows the user to write any language using the regular English computer keyboard without memorizing difficult key sequences. It allows transliteration (type-as-you-pronounce) and remapping of keyboards. Thus if the user is interested to write Ancient Greek or Hebrew in an English Windows, KickKeys is the only choice. Kickkeys allows the user to specify his/her own key mapping, change existing ones, use any font he/she likes and, to top it all, it allows the user to use these features on WordPad, Microsoft Word, Outlook, Outlook Express, Excel, Frontpage, Powerpoint, Eudora and other common Windows applications. It ships ready with key maps and fonts for several languages like Assamese, Bengali, Bulgarian, Belarusian, French, Farsi, German/Scandinavian, Hindi, Italian, Portuguese, Russian, Spanish, Tamil and Ukrainian. It also comes with graphical tools that allow the user to build keymaps for all other languages and fonts. It even supports typing right to left languages like Farsi in English Windows.
·LCard at Lingresource.Com: LCARD - A BRAND NEW INFO PRESENTER! LCard Is a Unique Aid to Organize and Present Your Information LCard is software specifically designed and has its target audience – humanists and classical scholars including students writing their end-of-term course papers or diplomas, teachers and scientists writing PhD theses, articles and monographs, and everybody in need of presenting homogeneous information (lists of literature, authors or sources). Translators and translation studies experts will especially benefit from this program because they specifically need to always present a original-translation pair cards for analyzing texts.
·lid - language identifier: lid is a a C/C++ language and encoding identification library. lid is fast, stable and easy to use, provides accurate results and does not have additional software dependencies. Even short passages can in most cases be determined accurately. With minimal hardware requirements and a high performance, lid is very effective and recognizes a variety of languages and character encodings. The list of supported languages is extended regularly and currently covers all official languages of the European Union. For every single language a wide range of character encodings is supported, including full support for all common Unicode Transformation Formats (UTF-8, UTF-16, UTF-32). A particular feature of this language identifier is, that it may even identify the language of texts in a transliterated form for some languages. lid is provided for various Unix-like operating systems (Linux, Solaris, FreeBSD).
·LingPipe Java API: LingPipe is a Java API for linguistic processing tasks that include: tokenization, sentence detection, part-of-speech tagging, phrase chunking, entity detection, within document coreference. It also has efficient language model based classifiers, noisy-channel spell correction. Source included.
·LIWC - Linguistic Inquiry and Word Count: LIWC calculates the percentage of words within each file along 72+ dimensions. Categories include negative emotions (including anger, anxiety, sadness), positive emotions, cognitive processing, standard linguistic dimensions (pronouns, prepositions, articles), and common content categories (death, sex, occupation, etc). This is a sound program from a psychometric perspective -- both in the creation of categories and the validation of the dictionaries. Dictionaries in English, Spanish, German, Dutch, Italian, and Norwegian are available; partial dictionaries in Korean, Hungarian, and French,
·MarkWrite Electronic Marking Tool: MarkWrite is an electronic marking tool which enables the user to create customised lists of standardised feedback comments as well as feedback checklists. It also enables the user to create assessment schemes and will automatically calculate the final marks per student and per class group. MarkWrite integrates easily with web-based learning platforms functioning on Sakai. MarkWrite allows the user to build up a small corpus of marked student texts, with comments and markup, in essence creating a personal corpus of student errors. MarkWrite is currently in Beta form and testers are needed. Please send hints, comments, questions or corrections.
·MiniJudge: A free, open-source software tool for designing, running, and analyzing small-scale experiments on linguistic judgments.
·MtRecode: A Character Conversion program.
·Multilingual, Fast Typing: Shabda-Brahma ET-Feel Word-Storm Processor (SB): This auto-suggesting, intelligent text-processor is just great for typing in the skeletal, raw form of your text into the PC. As you type the first few letters of a long word (or of a repeated phrase/ clause), SB tries to cleverly guess the intended word/ phrase and displays that auto-suggestion (2 choices) on the screen. If guessed by SB correctly, you need to just press Insert/ Alt key to get that auto-suggestion auto-typed. Newer auto-suggestions may even be auto-learnt from the user's written texts (along 20 user-paths or 80 languages). Defining & using up to 10,000 direct 3-key shorthand are also possible. Also able to auto-form usual symbols & even South-Asian 'conjunct-consonants' (juktakshar), and displaying font-specific onscreen keyboards, SB is even more useful for non-English typing. SB exports its typed text as HTML-output, to be copied using any Internet-browser into any word-processor (for final use therein). SB v5.8.3 is simpler, faster and practically a freeware!
·MyFontKeys: This software allows you to use any fonts and customize the keyboard within any application.
·myWriterTools: Software for writers and editors: Turn Word into a powerful writing tool. Fix formatting problems; find improperly used words, jargon and cliches; make documents gender neutral; answer your grammar and style questions; make backup a cinch; and convert between U.S. and U.K. word usage. The Editor's Edition adds tools for proofreaders and copy-editors to maintain consistency, add queries to the author, and produce a professional stylesheet to accompany the edited document.
·Online text summarization for french texts: As a software of automatic text summarization, Pertinence gives the possibility to users to reach easily and quickly to the extraction of the important textual information . Pertinence acts as on- and off-ramps to the information superhighway, allowing friendly access to the relevent information. The convenience provided by Pertinence is essential to several tasks, such as for effectively accessing very large and unstructured databases such as the World Wide Web, or an Intranet or own text databases stocked in a computer. Pertinence demo is free ( french texts) for Document types : ASCII, HTML, PDF TRY FREE ONLINE Automatic text summarization : http://www.pertinence.net/register_en.html
·OTKit: Tools for Optimality Theory: New software for OTers: OTKit OTKit is a software package containing tools for Optimality Theory. It includes a user interface and a Java library. The user of the interface can define a number of elements (forms, candidates, Gen functions, constraints, hierarchies,...) and run experiments with them, such as calculating the grammatical forms or draw tableaux. The Java package offers corresponding classes to Java programmers. Version 1.0 of OTKit is still in an experimental phase. The user interface offers only tools for modeling linguistic competence, while the Java library already contains certain tools for research on performance and learning, as well. The user interface also includes a script language (still under development), and an XML format to save elements. Please visit the website of OTKit: http://www.birot.hu/OTKit/.
·Paradigm - A Better Way to Build Your Experiments: Paradigm is a new stimulus presentation system for building interactive, millisecond accurate experiments for linguistics research. With Paradigm, you can: * Build experiments quickly and easily using Paradigm’s intuitive drag and drop user-interface. * Present images/sounds, rating scales, self-paced reading trials, rich text and movies. * Present interactive experiments like identification and visual world tasks using Paradigm's built-in mouse selection. * Optimize timing accuracy with adjustable experiment priority settings. * Control your eye-tracker, EEG system or fMRI scanner using Paradigm's Parallel and Serial Port events. * Use Paradigm's Python based scripting language to completely customize your experiment. * Distribute 'stand-alone' experiments over the web or in your lab without buying additional run-time licenses. Try out Paradigm for 30 days free of charge. To find out more, go to: http://www.perceptionresearchsystems.com
·R-Varb implemented in R: Those looking for statistical software should note that R is useful for many things in addition to corpus linguistics. I have used R or the proprietary program on which it is modelled, S, for phonetics since 1982. There is also a Varbrul-like variable rule package called R-Varb implemented in R: http://ella.slis.indiana.edu/%7Epaolillo/projects/varbrul/rvarb/
·RAM 4.0: RAM is a tachistoscope for flashing lines of text to the monitor screen at the exact rate desired. A major upgrade has appeared for the Reading Acceleration Machine, a freeware tachistoscope for Windows. New features in version 4.0 include random review, automatic random review, looping, multiple bookmarks, an acceleration function, the ability to copy particular words or lines displayed to another file, and larger possible time-settings.
·Redet: Redet is a tool for performing regular expression matching and substitution. It is useful for performing complex searches of corpora and lexica as well as for transforming data. It permits the user to define named character classes and to take their intersection, with the result that it is possible to run searches on feature matrices. It provides considerable assistance for the user, including a palette of regular expression constructions, a history list that persists across sessions, extensive help, and a set of character entry tools including IPA charts and a simple facility for defining custom character charts. Numerous aspects of the program are configurable. Unicode is fully supported.
·Schackne Online: Online Education for Language Learners.
·SEMANA software for interactive semantic data mining: Knowledge acquisition using the Knowledge Discovery in Databases (KDD) technology (with data mining in its core) is situated halfway between Database Management and Automated Discovery. It is computationally possible today to reveal usually “invisible” or “hidden” remarkably compound (lattice) structures analyzing very simple tabular representations of gathered atomic data, and much more... See also the Mailing list CASK (Computer-aided Acquisition of Semantic Knowledge)
·SenseClusters: SenseClusters is a suite of Perl programs that supports unsupervised clustering of similar contexts. It relies on it's own native methodology, and also provides support for Latent Semantic Analysis. SenseClusters is a complete system that takes users from preprocessing of text to clustered output. It supports the selection of features, the creation of various kinds of context representations, dimensionality reduction via Singular Value Decomposition, clustering, and analysis of results.
·SignStream: A Macintosh application to assist with the linguistic analysis of video-based language data.
·System Quirk: The System Quirk family of applications are designed to aid in the production and maintenance of texts and terminologies. These applications are of specific relevance to computational linguists and language engineers.
·TeXShop: A TeX previewer for Mac OS X, written in Cocoa.
·Textanz: Textanz builds a list of word and phrase frequencies from text. This information allows you to detect excessive use of words and expressions. Such a stylistic control is not less important, than become already the standard spell checking function. Especially advisable is to check business documentation. The first impression that reader gets from your commercial offer, project, resume, contract, report , etc. in many respects depends on writing style. It is also useful to analyze frequencies in your informal writing, generally in any text which you assume to give someone for reading. When you are in a role of reader, Textanz will help again. Most often used phrases will prompt, what idea was main for the author at the moment of writing, and probably reveal implicit psychological aspects. Word frequency list is a part of so-called stylistic portrait of the writer. In linguistics research, this is often used for identification of authorship (something similar to handwriting). Developers and a web-master also can find advantage in Textanz , when choosing keywords for web-page or search for repeatable fragments of program source code.
·TextCat Language Guesser: Determines the language of a given text. Supports more than 65 languages. Free.
·The TeX Catalog: Linguistics: Tthe TeX Catalogue will eventually list all packages available from the "Comprehensive TeX Archive Network", or CTAN, for using plain TeX, LaTeX, ConTeXt, etc. by topic. We also try to give some advice that might help you in getting software and guides not available on CTAN.
·Topicalizer: Topicalizer is a text analysis, topic extraction and keyword analysis tool. Based on methods of computational linguistics it provides various analyses for a given URL or plain text. These comprise, amongst others, language recognition, lexical density, keywords, collocations, word and phrase frequencies, readability and a short abstract. Topicalizer also is able to find similar pages according to the keywords it has extracted from a document. Moreover, Topicalizer provides an API for use by external applications.
·TransAbacus - Website word counter software: TransAbacus is a desktop application that counts the amount of words in an entire web site. It not only counts the words in the body of each web page, but also counts the words in titles, meta tags, alt texts and files. TransAbacus helps translation professionals to accurately estimate and budget website translation or localization projects.
·Unicode-Keyboard-Layouts for Win2k/XP: Multilingual keyboard-layouts for all latin-writing and cyrillic European languages and IPA, based on the standard German keyboard-layout. Easy to install. Note: All descriptions are only in German.
·unidesc: This package consists of four programs for finding out what is in a Unicode file. They are useful when working with Unicode files when one doesn't know the writing system, doesn't have the necessary font, needs to inspect invisible characters, needs to find out whether characters have been combined or in what order they occur, or needs statistics on which characters occur.
·World Language Mapping System: Data set of worldwide language homeland areas (polygons) and point locations for use in Geographic Information Systems (GIS). Dataset developed jointly by SIL and GMI maps all languages of the 14th Edition Ethnologue, and includes substantially all of the data of the published Ethnologue as GIS attribute fields.
·XLingPaper: Writing Linguistic Documents in XML: XLingPaper is a way to author and archive linguistic papers or books using XML. This makes it possible to mark-up one's document in a way that it can be formatted in multiple ways without having to make any changes to the original document.

Software: Parsers

·CLaRK - an XML-based System for Corpora Development: CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
·Computational Linguistics in Poland: This web page contains links to various sites devoted to Computational Linguistics (CL) / Natural Language Processing (NLP) / Linguistic Engineering (LE) in Poland, including sites containing Polish resources such as corpora, lexica, etc.
·Leopar 1.0.0 release: LEOPAR is an Open Source natural language parser. It is based on Interaction Grammars (http://leopar.loria.fr/doku.php?id=ig:formalism) and produces deep syntactic structures for grammatical sentences. An online demo (with resources for the French language) can be found here: http://leopar.loria.fr/demo It can produce both dependency structures and phrase structures. A set of French linguistic resources (grammar and lexicon) is available for LEOPAR. LEOPAR can be installed on Linux and MacOS.
·Link Parser v3.0: A syntactic parser of English, based on link grammar. Online demonstration, documentation, and downloadable software and API.
·NooJ: NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several levels of linguistic phenomena: orthography and spelling, lexicons for simple words, multiword units and frozen expressions, inflectional, derivational and productive morphology, local, structural syntax and transformational syntax. For each of these levels, NooJ provides linguists with one or more formal tools specifically designed to facilitate the description of each phenomenon, as well as parsing tools designed to be as computationally efficient as possible. This approach distinguishes NooJ from most computational linguistic tools, which provide a single formalism that should describe everything. As a corpus processing tool, NooJ allows users to apply sophisticated linguistic queries to large corpora in order to build indices and concordances, annotate texts automatically, perform statistical analyses, etc. NooJ is freely available and linguistic modules can already be downloaded for Acadian, Arabic, Armenian, Bulgarian, Catalan, Chinese, Croatian, French, English, German, Hebrew, Greek, Hungarian, Italian, Polish, Portuguese, Spanish and Turkish.
·TiGer Search: Tools for linguistic text exploration; also for Mac OS X

Software: Phonetic Analysis

·AGTK: An annotation graph toolkit. Also available for Mac OS X.
·AKUSTYK for Praat: AKUSTYK is an online resource for linguists, including free speech analysis & synthesis software, tutorials, online seminars, and equipment reviews.
·AlteruPhono: AlteruPhono is an open-source software for developing and testing models of regular phonetic sound changes, simulating the diacronic evolution of a word. Its rules are developed using the common combination of features such as point of articulation and vowel roundness.
·Audiamus: Audimaus builds a corpus of linked text and media. It is a cross-platform tool that allows presentation of textual material linked to unsegmented media files, using quicktime to instantiate links. It was developed as a means of working interactively with field recordings and of presenting texts and example sentences as playable media with a dissertation.
·Child Phonology Analyzer: This user-friendly and easy-to-use tool provides phonological and lexical analyses of child speech (but can be adjusted for other types of corpora). It is intended to be used with corpora stored in Microsoft Excel files. The tool offers a detailed phonological analysis, allowing you to count instances of different segments, articulatory features, syllablic structures and strings (words and parts of words) within a given age range. It also offers an account of lexical development, portraying stages of development (by cumulated attempted target words).
·FreP: Frequency in Portuguese: FreP is an electronic tool that allows the extraction of frequency information of Portuguese phonological units at the word-level and below. It runs on written texts, following the current orthographic conventions. FreP was conceived as a public domain tool, with the restriction of being used for scientific, non-commercial, purposes. FreP emerged from a joint (ongoing) project involving Marina Vigário (Univ. Minho), Fernando Martins (Univ. Lisboa/ILTEC) and Sónia Frota (Univ. Lisboa), which started in July, 2004. To get/update FreP, please write to fmartins@fl.ul.pt .
·Interactive Sagittal Section: Displays sagittal sections and IPA transcriptions for user-specified lip and tongue positions, using JavaScript.
·Metricalizer: The Beta-Version of a tool for automatical metrical analysis of german written poetry in stressed and unstressed syllables. This Beta-Version is only able to detect regular meters (e.g. +--+--+--).
·Metricalizer²: The release-version of a program for automated metrical analysis of German poetry. Copy any poem inside and get back information about prosody, meter, rhyme and metrical complexity.
·NORM: A Vowel Normalization Suite: NORM is a web-based vowel normalization and plotting package. NORM allows users to normalize formant data using a wide variety of published procedures (Nearey, Lobanov, a Bark difference method, etc). The processing is implemented in R and the R script is available for download and customization.
·pause: Pause determines the location of silences in a audio file for use in fragmentation of large recordings, studies of pause duration, and the like.
·PENTAtrainer: An interactive script for automatic extraction of pitch targets for prosody synthesis. It is based on the qTA implementation of the PENTA model, and it allows users to: - Automatically extract pitch target parameters (slope, height, strength) - Resynthesize F0 contours based on the extracted parameters - Specify target location and restrict direction of target slope - Exhaustively process all wave files in a folder - Automatically collect extracted parameters from all sounds in a folder and save them in ensemble files for ease of further analysis and processing
·Phono: Software tool for creating and testing models of regular historical sound change. This version (4.1) runs on Windows (Version 3.3 was for DOS).
·Phonology Assistant: Phonology Assistant creates consonant and vowel inventory charts and assists the user in searching through the data corpus to do phonetics and phonology. Phonology Assistant 3.0 provides support for Unicode. In addition, PA is designed to support users of Speech Analyzer, Toolbox, and FieldWorks Language Explorer with a phonology tool that works interactively with the data stored in those applications.
·ProsodyPro: An interactive script for large-scale systematic prosody analysis. This is an updated version of its predecessor TimeNormalizeF0. Among its many feature are the following: - Semi-automatic extraction of F0 contours, allowing users to improve the accuracy of F0 tracking by rectifying vocal pulse marking - Exhaustively process all wave files in a folder - Automatically save time-normalized F0 and intensity contours for selected intervals - Automatically save continuous F0 velocity contours for selected intervals - Automatically save many prosodic measurements, including: maxf0, minf0, f0range, meanf0, mean intensity, duration, max velocity, final velocity, final f0 and mean intensity - Automatically collect extracted contours and measurements from all sounds in a folder and save them in ensemble files for ease of further analysis and processing
·R-Varb implemented in R: Those looking for statistical software should note that R is useful for many things in addition to corpus linguistics. I have used R or the proprietary program on which it is modelled, S, for phonetics since 1982. There is also a Varbrul-like variable rule package called R-Varb implemented in R: http://ella.slis.indiana.edu/%7Epaolillo/projects/varbrul/rvarb/
·Sanchay: Sanchay is an open source platform for working on languages, especially South Asian languages, using computers and also for developing Natural Language Processing (NLP) or other text processing applications. It consists of various tools and APIs for this purpose. It is still in the development stage and the design has not yet stabilized, but components like a text editor with customizable support for languages and encodings, annotation interfaces, etc. was first released as an experimental version (0.1) on Sourceforge.net. The next version (0.2) has been available on the Internet and has also been released on Sourceforge.net, along with the latest version (0.3). It is meant to be complementary to the other existing NLP tools and libraries. Some of the components in the released version are: Syntactic annotation interface, generalised table and tree components, SSF (Shakti Standard Format) API, feature structure API, parallel corpus markup interface, customizable language and encoding support, Sanchay text editor, language and encoding identification, file splitter and format converter, task setup generator (only for syntactic annotation), a simple but powerful data structure called Properties Manager along with a GUI for purposes like customization of applications, a find/replace/extract tool, a CRF based automatic annotation tool, and a tree visualizer for phrase structure and dependency relations. User documentation has been provided for some of these components. More will be added soon. Some API doc umentation for programmers will also be provided later. Many other components are in the pipeline. Hopefully other people will get involved with the development so that Sanchay can provide much needed support for South Asian languages for as many purposes as possible. Sanchay has an object oriented architecture where the emphasis is on a design based on things like modularity, reusability, extensibility and maintainability. The implementation is purely in Java, which means it is platform independent and can be used on Windows as well as Linux without needing any extra setup except installing JDK or JRE.
·SndBite: SndBite is a specialized audio editor, designed for breaking large recordings into smaller components with great efficiency. Special features include: *Multiple simultaneous views of the waveform at different resolutions. *The ability to position window edges at transitions between sound and silence. *Automated setting of cut points at zero-crossings. *Automatic filename generation easily controlled by the user. *Optional automatic playback on window motion. *Logging of each write.
·Sylli - The SSP Syllabifier: Sylli is a textual syllabifier. Developed for Italian, it can easily be adapted to any language that is claimed to respect the SSP. Sylli divides timit, strings, files and directories into syllables and provides other useful function for syllable analysis. Remember that Sylli uses an algorithm written to parse ASCII phonological transcriptions, so you may need to adapt sonority.txt to your alphabet and write the input to phonetic transcription (it might be advisable to use an x-Sampa alphabet).
·Texai Lexicon: The Texai lexicon is a merging of WordNet 2.1, the CMU Pronouncing Dictionary, Wiktionary, and the OpenCyc lexicon. The format is RDF, N3 or TriG. Included are entries for lemmas, word forms, word senses, sample phrases and ARPABET pronunciations. A documentation file is available as a separate download.Only the TriG version contains context.
·The Emu Speech Database System: A system for managing collections of speech data which supports hierarchical labelling of utterances. Emu is freely available and supports a range of file formats.
·Toney: Software for Phonetic Classification: Toney is a free software tool that supports classification of spoken forms into phonetic categories. Use it to manually sort linguistic forms into clusters, then listen to all items in a given cluster in order to hear any outliers. It is ideal for use in early elicitation tasks, in which the linguistically salient phonetic categories are not yet clearly established. Toney uses specially formatted Praat TextGrid files and corresponding audio files. Unix, Mac, and Windows distributions are available, along with sample datasets.
·WaveSurfer: A tool suited for a wide range of tasks in speech research and education.

Software: Speech Recognition and Synthesis

·AKUSTYK for Praat: AKUSTYK is an online resource for linguists, including free speech analysis & synthesis software, tutorials, online seminars, and equipment reviews.

Software: Taggers

·A tagger for German: A tagger for German (with interactive online demo version).
·Adsotrans Chinese-English Annotation Engine: Adsotrans is a collaborative open source Chinese-English annotation project designed to assist learners of Chinese as a second language. It comes with a large database of semantically-tagged Chinese word information.
·CLaRK - an XML-based System for Corpora Development: CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
·CLAWS part-of-speech tagger: POS tagging software for English text, CLAWS (the Constituent Likelihood Automatic Word-tagging System).
·Computational Linguistics in Poland: This web page contains links to various sites devoted to Computational Linguistics (CL) / Natural Language Processing (NLP) / Linguistic Engineering (LE) in Poland, including sites containing Polish resources such as corpora, lexica, etc.
·SALTO Semantic Annotation Tool: SALTO is a graphical tool that supports manual annotation of text corpora with (frame) semantic argument structures. The tool was developed within the SALSA project (http://www.coli.uni-saarland.de/projects/salsa/) at Saarland University. SALTO can be used to add a second (typically semantic) layer of annotation to corpora that are already syntactically analyzed (through manual annotation or automatically). Main features are: Query-based creation of subcorpora for annotation, Distribution of corpora to different annotators, Definition of Items and Classes/Tags to be annotated, Comfortable annotation with visual editor and mouse-menus, and Semi-automatic merging and adjudication of parallel annotations in same editor.
·UAM CorpusTool: The UAM CorpusTool is a text annotation tool, allowing annotation of a plain text corpus (collections of text files) at multiple linguistic levels. The annotation scheme at each level is provided by the user in terms of a hierarchical tree of features (allowing cross-classification). The tool allows complex search of the corpus, including concordancing. Another interface allows you to produce statistical analyses of the corpus (descriptive, comparative). Windows and Macintosh are supported. On Windows, full unicode support.
·WordStat: WordStat is a text analysis module specifically designed to study textual information such as responses to open-ended questions, interviews, titles, journal articles, public speeches, electronic communications, etc. WordStat may be used for automatic categorization of text using a dictionary approach or text mining. WordStat can apply existing categorization dictionaries to a new text corpus. It also may be used in the development and validation of taxonomies. When used in conjunction with manual coding, this module can provide assistance for a more systematic application of coding rules, help uncover differences in word usage between subgroups of individuals, assist in the revision of existing coding using KWIC (Keyword-In-Context) tables, and assess the reliability of coding by the computation of inter-raters agreement statistics. WordStat includes numerous exploratory data analysis and graphical tools that may be used to explore the relationship between the content of documents and information stored in categorical or numeric variables such as the gender or the age of the respondent, year of publication, etc. Relationships among words or categories as well as document similarity may be identified using hierarchical clustering and multidimensional scaling analysis. Correspondence analysis and heatmap plots may be used to explore relationship between keywords and different groups of individuals.

Software: Transcription

·AGTK: An annotation graph toolkit. Also available for Mac OS X.
·Audiamus: Audimaus builds a corpus of linked text and media. It is a cross-platform tool that allows presentation of textual material linked to unsegmented media files, using quicktime to instantiate links. It was developed as a means of working interactively with field recordings and of presenting texts and example sentences as playable media with a dissertation.
·AX - all accents with one key: A tiny, free, open source, Windows utility that uses just a single key to generate any accent or special character (within the limits of the character set being used). It is supplied with eleven European languages and can be easily re-configured. It takes moments to learn instead of forever fiddling with special keyboard codes or key combinations.
·Central Sinama Keyboard: A keyboard specifically designed for the Central Sinama language, but also useful for all Sama-Bajaw languages and Tausug. Included special characters for typing Sinama are āēīōūꞌ. ₱ñ°© are included for broader Philippine language use. əŋʔ are included for use by linguistic workers.
·EXMARaLDA: A system and toolset for creating, managing and analysing corpora of transcriptions of spoken language. Consists of an editor for transcriptions in musical score notation, a corpus manager and a search tool. All file formats are XML based which maximizes exchangeability and archiveability. Many import and export functionalities (Praat, ELAN, AGTK, RTF, HTML, SVG etc.).
·i2Speak - Smart IPA Keyboard: An online Smart IPA Keyboard that lets you quickly type IPA phonetics without the need to memorize any symbol code. For every Roman character you type, a popup menu displays a group of phonetic symbols that share the same sound or shape beneath typed character. You can save typed phonetics as an MS-Word file by clicking the Save button, copy them to clipboard using the Copy button, or post them to Twitter, Facebook, etc by clicking the desired button.
·IPAKLICK: a freely accessible tool that makes it easy to insert strings of IPA-symbols (Unicode) into a text.
·IPANow! Software: IPANow! by PhoneticSoft is a powerful yet simple tool that automatically transcribes Latin, Italian, German and French texts into International Phonetic Alphabet (IPA) symbols by applying rules utilized by scholarly lyric diction textbooks. Simply type or paste in a text, and with the click of a button IPANow! produces an IPA transcription underneath each line of text that can then be exported in Rich Text Format (.rtf) IPANow! is designed as a lyric diction resource for choral conductors, professional vocalists, church musicians and music educators, but anyone can use it. IPANow! allows choral directors to easily produce professional-looking phonetic transcriptions of foreign language texts to distribute to choir members.
·KMap IME: KMap IME is an os-independent inputmethod. Using any keyboard layout, it currently supports: Arabic, Armenian, IPA, Aymara, Azeri, Belarusian, Bengali, Berbere, Breton, Bulgarian, Catalan, Cherokee, Cimbrian, Comanche, Croatian, Czech, Dakelh, Danish, Devanagari, Dutch, Esperanto, Estonian, Ethiopic, Faroese, Farsi, Finnish, French, Georgian, German, Greek, Guarani, Gurmukhi, Hanunoo, Hawaiian, Hebrew, Hungarian, Icelandic, Inuktitut, Kannada, Kazakh, Latvian, Lithuanian, Malayalam, Maori, Korean, Mongolian, Nahuatl, Navajo, Norvegian, Occitan, Ogham, Oriya, Persian, Piemontese, Polish, Romanian, Russian, Sanskrit, Serbian, Slavic, Slovenian, Spanish, Syriac, Tamil, Telugu, Thai, Tibetan, Ukrainian, Urdu, Vietnamese, Welsh, Yiddish
·MarkWrite Electronic Marking Tool: MarkWrite is an electronic marking tool which enables the user to create customised lists of standardised feedback comments as well as feedback checklists. It also enables the user to create assessment schemes and will automatically calculate the final marks per student and per class group. MarkWrite integrates easily with web-based learning platforms functioning on Sakai. MarkWrite allows the user to build up a small corpus of marked student texts, with comments and markup, in essence creating a personal corpus of student errors. MarkWrite is currently in Beta form and testers are needed. Please send hints, comments, questions or corrections.
·ONZE Miner: ONZE Miner is a browser-based linguistics research tool that stores audio recordings and text transcripts of interviews. The transcripts can be searched for particular text or regular expressions. The search results, or entire transcripts, can be viewed or saved in a variety of formats, and the related parts of the audio recordings can be played or opened in acoustic analysis software, all directly through the web-browser.
·Phonetics Builder: A simple to use and free application to insert Phonetic characters into your documents,worksheets or lesson plans. Phonetics Builder can also be used to correctly format Pinyin for insertion into documents.
·Phonmap - Phonemic script writer: Easily add phonemic script to Windows documents. No more searching font tables.
·ThetaCircle: ThetaCircle is an easy to use typing and scripting tool. Use it to type, copy and then pipe or paste your script to your favorite word processor. The major use of ThetaCircle is for translation and transcription. Type in any Unicode font on your system, and if you're not satisfied with that, build any typing structure that you want such as using the A key to type a special character, [VP] or a full sentence. ThetaCircle has subversions suited for specific font systems like Hebrew, Arab, Katakana, Hiragana, Thai, Greek, Latin, Cyrillic, and the International and American Phonetic Alphabet (IPA APA).
·Transformer: The Transformer is a tool to convert between the file formats of various annotation programs (Praat, ELAN, Transcriber, CLAN, Transana (in preparation)). Transcript files can also be transformed into various outputformats for publication like simple text, soziogram an partiture. Some features are automatic calculation of pauses, selecting which speakers to include in the ouput,... An english version is available soon.
·Wazéma Ethiopian Computer Writing System: Wazéma System is a Windows (9x/Me/NT/2000/XP) and Apple Macintosh (System7-Mac OS8-9.5) compatible computer writing system for Amharic and all Ethiopian languages. It is freely available from: http://members.aol.com/W4z5m4/wazema.html The system includes a keyboard system based on the Ethiopian syllabary, six professional quality True Type font families, gemination marks, the full musical notation of the Ethiopian Orthodox Tewahdo Church, etc.
·WordStat: WordStat is a text analysis module specifically designed to study textual information such as responses to open-ended questions, interviews, titles, journal articles, public speeches, electronic communications, etc. WordStat may be used for automatic categorization of text using a dictionary approach or text mining. WordStat can apply existing categorization dictionaries to a new text corpus. It also may be used in the development and validation of taxonomies. When used in conjunction with manual coding, this module can provide assistance for a more systematic application of coding rules, help uncover differences in word usage between subgroups of individuals, assist in the revision of existing coding using KWIC (Keyword-In-Context) tables, and assess the reliability of coding by the computation of inter-raters agreement statistics. WordStat includes numerous exploratory data analysis and graphical tools that may be used to explore the relationship between the content of documents and information stored in categorical or numeric variables such as the gender or the age of the respondent, year of publication, etc. Relationships among words or categories as well as document similarity may be identified using hierarchical clustering and multidimensional scaling analysis. Correspondence analysis and heatmap plots may be used to explore relationship between keywords and different groups of individuals.

Software: Concordances

·A Simple Concordance Program: Windows based program for creation of wordlists and concordances.
·aConCorde: aConcorde is a multi-lingual concordance tool. Originally developed for native Arabic concordance, it posses basic concordance functionality, as well as English and Arabic interfaces. Written in Java, so will run on any platform that has the Java Runtime Environment installed.
·Apple Pie Parser: MonoConc Pro 2.0 and MonoConc 1.5: Two concordance programs for linguists and other language researchers.
·CLaRK - an XML-based System for Corpora Development: CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
·Conc: Concordance software for the Macintosh, developed by the Summer Institute of Linguistics.
·Concordance - the program: Flexible text analysis software. Lets you gain better insight into e-texts. Make concordances, word lists, indexes. Count word frequencies, find phrases, and more. Publish results to the Web with one click. For Windows XP/2000/NT/ME/98/95
·DictMaker: Lets you create dictionaries. For Mac OS X.
·KH Coder: Quantitative Text Analysis: KH Coder is a free software for quantitative analysis of Japanese, English, French, German, Italian, Portuguese and Spanish language text. KH Coder provides these functions using back-end tools such as Stanford POS Tagger, Snowball stemmer, MySQL and R. Just input raw texts and you can utilize these functionalities. Originally, KH Coder was developed for content analysis in sociological field and only supports analysis of Japanese language data. But significant number of linguistic researches has been conducted with this software and now it supports other languages.
·KwicKwic: KwicKwic, developed by Clayton Darwin, is a a fast and easy-to-use tool for investigating text data. KwicKwic was designed as a simple but powerful search tool for linguists, but it can be used in many other fields. KwicKwic is currently available for Windows (XP and higher) users only, and is Unicode compliant.
·UAM CorpusTool: The UAM CorpusTool is a text annotation tool, allowing annotation of a plain text corpus (collections of text files) at multiple linguistic levels. The annotation scheme at each level is provided by the user in terms of a hierarchical tree of features (allowing cross-classification). The tool allows complex search of the corpus, including concordancing. Another interface allows you to produce statistical analyses of the corpus (descriptive, comparative). Windows and Macintosh are supported. On Windows, full unicode support.
·WordSmith Tools: Part of a concordance on hands using Guardian newspaper text as the source.
·WordSmith Tools: A suite of pc software for lexical analysis of corpora in a very wide variety of languages. Offers oncordancing, wordlisting, key words analysis and a number of other utilities. WordSmith 3.0 (OUP, 1999) handles Windows 3.1 and better and is restricted to Ascii/Ansi text; WS 4.0 (2002) requires Windows 98B or better and handles Unicode as well as Ascii/Ansi text. Version 4.0 was issued in 2004. This is a complete new edition with many limitations removed and numerous additional features, such as sound concordancing, use of Unicode, tools for obtaining text from the Internet, etc.

Software: Software Directories

·Computational Linguistics in Poland: This web page contains links to various sites devoted to Computational Linguistics (CL) / Natural Language Processing (NLP) / Linguistic Engineering (LE) in Poland, including sites containing Polish resources such as corpora, lexica, etc.
·Corpus-based Computational Linguistics: Many links to corpus-based computational linguistics software.
·Index of linguistics software: Linguistic software from the University of Michigan
·Language Learning Software Reviews: Video reviews of many different language learning programs on many different languages. I demonstrate how they operate and show you what you'll be doing to learn a language with the software.
·Language Software Reviews: A comparison and review of several language learning software titles.
·TeX/LaTeX Information: A brief and useful overview for linguists interested in using LaTeX.
·Toolbox for linguistic research: Toolbox for linguists, covering: ICT tools (office applications, data visualization, databases), web tools (social bookmarking, bookmarking of bibliographic information, research wikis), biblio tools (information about bibliographic databases relevant for linguists, bibliographic management tools), linguistics (linguistic journals) and corpus linguistics.
·WordNet: An on-line lexical reference system. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept.

Speech Analysis (including Clinical Speech Analysis)

·AKUSTYK for Praat: AKUSTYK is an online resource for linguists, including free speech analysis & synthesis software, tutorials, online seminars, and equipment reviews.
·Center for Spoken Language Understanding: Extensive collection of corpora and tools for speech and language research.
·Praat: Phonetic analysis/manipulation/synthesis and Optimality-Theoretic learning. For Macintosh, Windows, Linux, SGI, Solaris, HP-UX.
·The SNACK Speech visualization Module: An extension to the Tcl/Tk scripting language with additional commands for speech visualization.