Editor for this issue: Ann Dizdar <dizdar
tam2000.tamu.edu>
3RD `SPEAK!' WORKSHOP: SPEECH GENERATION IN MULTIMODAL INFORMATION SYSTEMS AND PRACTICAL APPLICATIONS 12 August, 1996 Budapest, Hungary In parallel with ECAI '96, preceding the ECAI '96 satellite workshop on Dialogue Processing in Spoken Language Systems ******************** CALL FOR CONTRIBUTIONS ******************** This workshop aims to bring together researchers, developers, and potential producers and marketers of multimodal information systems in order to consider the role of *spoken language synthesis* in such systems. Not only do we need to be able to produce spoken language appropriately - including effective control of intonation - but also we need to know in which practical contexts spoken language is most beneficial. This requires a dialogue between those providing spoken natural language technology and those considering the practical use of multimodal information systems. The workshop will consist of paper presentations and practical demonstrations, as well as a roundtable discussion on the best strategies for pursuing the practical application of spoken language technology in information systems. Suggested Topic Areas/Themes include, but are not limited to: * functional control of intonation in synthesized speech * use of speech in intelligent interfaces for information systems * integration of speech into automatic query systems * telecommunications applications * cooperative integration of speech with text generation for information systems * evaluation strategies for information systems involving speech synthesis * applications for information systems with spoken language output capabilities * practical requirements for information systems with spoken language capabilities. Potential participants are invited to submit short statements of interest indicating whether they would be interested in presenting a paper, offering a system demonstration, participating in the round table discussion, or simply attending. Statements of interest should be sent as soon as possible followed, where appropriate, by extended abstracts (max. 7 pages) by 1st. August by e-mail to: `nemethMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuettt.bme.hu' or by post to: Ge'za Ne'meth, Dept. of Telecommunications and Telematics, TU Budapest, Sztoczek u. 2. Budapest Hungary H-1111. Extended abstracts will be made available at the workshop. During the workshop current results and demonstrations of the EU Copernicus Programme Project `Speak!' will also be given (see attachment). The workshop will be held in a historic building in Buda castle, housing the Phonetic Laboratory of the Hungarian Academy of Sciences. Participation is free of charge. Contact: Ge'za Ne'meth Dept. of Telecommunications and Telematics TU Budapest Sztoczek u. 2. Budapest Hungary H-1111 E-mail: NEMETH
ttt.bme.hu Fax: +36/1-463-3107 Phone: +36/1-463 2401 - --------------------------------------------------------------- Project Information: The `SPEAK!' Project: Speech Generation in Multimodal Information Systems `SPEAK!' is a European Union funded project (COPERNICUS '93 Project No. 10393) whose aim is to embed spoken natural language synthesis technology with sophisticated user interfaces in order to improve access to information systems. Multimedia technology and knowledge-based text processing enhance the development of new types of information systems which not only offer references or full-text documents to the user but also provide access to images, graphics, audio and video documents. This diversification of the in formation offered has to be supported by easy-to-use multimodal user interfaces, which are capable of presenting each type of information item in a way that it can be perceived and processed effectively by the user. Users can easily process simultaneously the graphical medium of information presentation and the linguistic medium. The separation of mode is also quite appropriate for the different functionalities of the main graphical interaction and the supportive meta-dialogue carried out linguistically. We believe, therefore, that a substantial improvement in both functionality and user acceptance is to be achieved by the integration of spoken languages capabilities. However, text-to-speech devices commercially available today produce speech that sounds unnatural and that is hard to listen to. High quality synthesized speech that sounds acceptable to humans demands appropriate intonation patterns. The effective control of intonation requires synthesizing from meanings, rather than word sequences, and requires understanding of the functions of intonation. In the domain of sophisticated human-machine interfaces, we can make use of the increasing tendency to design such interfaces as independent agents that themselves engage in an interactive dialogue (both graphical and linguistic) with their users. Such agents need to maintain models of their discourses, their users, and their communicative goals. The `SPEAK!' project, which was launched as a cooperation between the Speech Research Technology Laboratory of the TECHNICAL UNIVERSITY OF BUDAPEST and the UNIVERSITY OF DARMSTADT (in cooperation with GMD-IPSI), is developing such an interface for a multimedia information retrieval system. The speech synthesizer used is the MULTIVOX TTS developed by the TU Budapest. At GMD-IPSI, the departments KOMET (natural language generation) and MIND (information retrieval dialogues) contribute to this project. A proof-of-concept prototype of a multimodal information system is being implemented, which combines graphical input and spoken language output in a variety of languages. The work involves three supporting goals: first, to advance the state of the art in the domains of speech synthesis, spoken text generation, and graphical interface design; second, to provide enabling technology for higher functionality information systems that are more appropriate for general public use; third, to significantly improve the public and industrial acceptance of speech synthesis in general and the Hungarian text-to-speech technology elaborated within the project in particular.