Editor for this issue: Karolina Owczarzak <karolina
linguistlist.org>
******************************* ELRA Technical Centers ******************************* *** CALL FOR CREATING A NETWORK OF TECHNICAL CENTERS FOR WRITTEN LANGUAGE RESOURCES VALIDATION *** 1. Preamble Describing, assuring and improving the quality of language resources are important tasks. The assurance of such quality is an important factor in ELRA's success. In the start up phase of ELRA it was foreseen that a Network of Technical Centers should be established to handle quality control. To date a technical center for the validation of spoken language resources has been established. ELRA now intends to initiate the establishment of a network of technical centers for the validation of written language resources, the Validation Centers for Written Language Resources or VC_WLR. Written resources include lexicons as well as text corpora, possibly enriched with all kinds of annotations (POS-tags, syntactic structures, etc.). The procedure to establish the VC_WLR is identical to the one adopted in establishing the technical centers for spoken language resources, viz. they are to be established via an open call. Those European institutions willing to act as a VC_WLR for ELRA should send an offer to ELRA. The contents of this offer are described below. In particular, the offer must contain a proposal on how to address the problem of the detailed and thorough knowledge of a wide variety of languages required by the validation of multilingual resources. ELRA's Board will decide which institutions will be selected. The selection of each candidate institution will be based on its ability to fulfill the tasks described in Section2. The organizational and financial aspects are described in Section 3. 2.Work packages (WP) of the VC_WLR 2.1 Extending the Methodology for Describing the Quality and Content of Existing WLR In the catalogue of ELRA many WLR are offered whose quality and content is not yet described in a satisfactory way. Some projects have resulted in linguistic resources distributed by ELRA that are comparable across languages in accordance with a commonly agreed content and format specification (e.g. PAROLE). However, almost no written data distributed by ELRA have been subject to validation by an external party and in accordance with a commonly agreed validation scheme (except for a limited number of PAROLE lexicons, and recently in the context of the ENABLER project). Though some research into the validation of linguistic resources has taken place and recommendations and guidelines have been formulated (e.g. Nancy Underwood et al., June 1998; Lou Burnard for text corpora), these have to be reviewed and where necessary adapted and extended to develop a concrete and workable methodology for the ELRA validation of written linguistic resources. The knowledge and expertise gained in the successful approach to validation taken in the SpeechDat family of spoken resources and by the existing ELRA validation center for spoken resources could be taken into consideration here, and its methods and approaches translated into an approach adapted for written language resources while maintaining the key elements that determined the success of the approach to speech. The first task of the VC_WLR is to establish and/or extend the methodology for quality and content description so far developed. The related document should focus on the quality and content of the WLR offered in the ELRA catalogue. A standard form should be developed for describing the content and quality of a WLR, starting from the form currently in use and taking into account the work carried out within TEI, OLAC, etc. The WLR in the ELRA catalog will have to be described according to this standard. This description will be used as a basis for providing any (potential) user with a quick overview in the ELRA catalogue relating to the quality and content of each WLR offered. Output of WP2.1: - Document describing methodology concerning quality and content - Content and quality description of all ELRA WLR 2.2 Improving the Quality of Existing WLR Existing WLR may have errors that could be removed with reasonable effort. The task of the VC_WLR is to establish a procedure to remove these errors. Especially a procedure has to be established which handles the errors reported by users of WLR (bug reporting procedure). Further, the existing WLR can be improved by better documentation, by reformatting according to established standards and by content changes. A similar procedure for spoken language resources has been proposed and is currently being implemented and experimented with, hence it is sensible to investigate to what extent the procedure proposed for SLR can be adopted for the improvement of WLR and what modifications and or extensions are necessary or desirable. The quality of the existing WLR should be gradually improved in accordance with a priority scheme that has to be worked out in close cooperation with ELRA's validation committee. The scheme has to be approved by the ELRA board. Output of WP 2.2: - Report describing the procedure to be used to improve existing WLR - Improve existing WLR according to a priority scheme 2.3 Quality Standards for WLR The VC_WLR have to play a leading role in establishing quality standards for WLR. for this task the VC_WLR have to cooperate with organizations involved in the production of WLR such as the consortia of the PAROLE and SIMPLE projects, and with ELRA's distribution agency (currently ELDA). Additionally, the extent to which existing recommendations, guidelines and proposed standards from groups such as the EAGLES and ISLE projects can be incorporated should be considered throughout. Output of WP 2.3: - Report describing the procedure for building up relationships with significant WLR producers and standards groups - Following on from the report, the establishment of those relationships 2.4 Validation of New WLR Owners of WLR regularly offer their WLR to ELRA for distribution. ELRA has the distribution carried out by its distribution agency (currently ELDA). Each time a WLR is offered for distribution, the task of the VC_WLR is to establish in cooperation with the owner of the WLR a manual containing: - The specification of the content of the WLR, - The validation criteria for checking the quality of the WLR, - The procedure to validate the WLR. Based on this manual the VC_WLR have to validate any new WLR offered for distribution. Output of WP 2.4: - Report on the validation procedure as specified in a specific contract between ELDA and the center(s) 2.5 Reporting Twice a year the VC_WLR must report work undertaken to date to the board of ELRA via the head of the validation committee. Output of WP 2.5: - Status reports 3. Organizational and Financial Issues 3.1 Relation between ELRA and VC_WLR Concerning the tasks 2.1, 2.2, 2.3, 2.5 as described above the relation between ELRA and the institution(s) that are appointed as VC_WLR will be regulated by a contract between ELRA and those institutions. The contract has to be renewed after every fiscal year of ELRA by the Board of ELRA. Three months before the end of each fiscal year of ELRA the Board of ELRA will decide on the financial support to be given to the VC_WLR for the next fiscal year to perform the tasks 2.1, 2.2, 2.3, 2.5. Annually, a letter of intent will describe a budget for the year for the VC_WLR. The initial amount made available will be approximately 15K EUR. The ELRA validation committee will act as a steering committee for all activities related to validation of written resources. All actions proposed by the validation committee and agreed upon between the validation committee and the appointed VC_WLR will have to be approved by the ELRA Board. 3.2 Relation between ELDA and the VC_WLR Separate contracts will be made with ELDA concerning task 2.4 on a case-by-case basis. 4. Format and Procedure for Offer To apply to be a VC_WLR, send your offer by e-mail (as ASCII or RTF files, approx. 2000 words) to the CEO of ELRA (Khalid Choukri, choukriMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueelda.fr) and to the head of the ELRA validation committee (Harald Hoege, harald.hoege
mchp.siemens.de). The e-mail should contain: 1. Name of the proposing institute 2. The name of the person at the institute who will be the head of the VC_WLR. 3. A statement outlining the suitability of the institute to act as a VC_WLR. 4. A proposal on how the institute plans to provide for the required detailed and thorough knowledge of a wide variety of languages. 5. A list of personnel who will work on the tasks to be undertaken by the VC_WLR. 6. A possible start date 7.3 Sketch of the work for the work packages described that can be carried out within the fiscal year 2002 (1.1.02 31.12.02) for a budget of inferior or equal to 15KEUR. For each work package a rough estimate for the costs should be given. Proposals are due by Friday March 1, 2002. ***************************** ELRA/ELDA 55-57, rue Brillat Savarin 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 Email: choukri
elda.fr ******************************