Editor for this issue: <>
The Text Software Initiative ---------------------------- An international effort to promote the development and use of free text software The widespread availability of large amounts of electronic text and linguistic data in recent years has dramatically increased the need for generally available, flexible text software. Commercial software for text analysis and manipulation covers only a fraction of research needs, and it is often expensive and hard to adapt or extend to fit a particular research problem. Software developed by individual researchers and labs is often experimental and hard to get, hard to install, under-documented, and sometimes unreliable. Above all, most of this software is incompatible. As a result, it is not at all uncommon for researchers to develop tailor-made systems that replicate much of the functionality of other systems and in turn create programs that cannot be re-used by others, and so on in an endless software waste cycle. The reusability of data is a much-discussed topic these days; similarly, we need "software reusability", to avoid the re-inventing of the wheel characteristic of much language-analytic research in the past three decades. The Text Software Initiative (TSI) is committed to solving this problem by working to o establish and publish guidelines and standards for the development of text software; o promulgate and coordinate the development of free TSI- conformant software. The scope of the TSI covers all areas of analysis and manipulation of all kinds of texts (written or spoken, mono-lingual or multi- lingual parallel, etc.), including markup of physical and logical text features, linguistic analysis and annotation, browsing and retrieval, statistical analysis, and other text-related tasks in research in computational linguistics, humanities computing, terminology and lexicography, speech, etc. The TSI software development effort is distributed, that is, anyone can contribute on a voluntary basis. This means that tools will be developed according to the contributors' priorities; however, the TSI is ultimately working towards the development of a comprehensive text handling system. To ensure software compatibility and reusability and enable distributed development, the TSI is committed to: o design and publish program interface conventions o determine and publish guidelines for programming style and documentation o stress separation of code and linguistic data to ensure (natural) language independence o emphasize breaking high-level text-handling tasks into more primitive, reusable functions o provide a library of primitive text-handling tools o maintain a task list and set priorities o circulate information such as progress reports, revisions to the standard, availability of new software, etc. o set up a mechanism for testing and evaluation o maintain mailing lists for comments, bug reports, suggestions, etc. The TSI works in relation with other standardization groups, notably the Text Encoding Initiative and the Expert Advisory Group on Language Engineering Standards (EAGLES). All TSI software is free in the sense defined in the Free Software Foundation's General Public License, which guarantees the freedom to copy, redistribute, and modify software, and protects this freedom by requiring those who pass on the software to include the rights to further redistribute it and see and change the code. Distribution of TSI software is accomplished in relation with other dissemination groups such as the Free Software Foundation, RELATOR, and the Linguistic Data Consortium. The TSI does not provide technical support, but organizes a network of voluntary consultants and support people. PROJECT COORDINATORS Nancy Ide, Vassar College, Poughkeepsie, New York, USA ideMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecs.vassar.edu Jean Veronis, Universite de Provence/CNRS, Aix-en-Provence, France veronis
grtc.cnrs-mrs.fr GENERAL ADVISORY BOARD Susan Armstrong, ISSCO, Geneva Mark Liberman, Linguistic Data Consortium, University of Pennsylvania Makoto Nagao, Kyoto University Mark Olsen, ARTFL Project, University of Chicago Richard Stallman, Free Software Foundation, Cambridge, Massachusetts Donald Walker, Bellcore, Morristown New Jersey Antonio Zampolli, Istituto di Linguistica Computazionale, Pisa The TSI also includes a TECHNICAL ADVISORY BOARD of software developers.