LINGUIST List 19.3604
|
Mon Nov 24 2008
FYI: KRYS I Corpus for Genre Classification Research
Editor for this issue: Matthew Lahrman
<matt linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Yunhyong
Kim,
KRYS I Corpus for Genre Classification Research
Message 1: KRYS I Corpus for Genre Classification Research
|
Date: 20-Nov-2008
From: Yunhyong Kim <y.kim hatii.arts.gla.ac.uk>
Subject: KRYS I Corpus for Genre Classification Research
E-mail this message to a friend
The Humanities Advanced Technology and Information Institute (HATII) at the University of Glasgow and the Digital Curation Centre (DCC) are delighted to announce the release of the KRYS I Corpus for genre classification research. http://www.krys-corpus.eu The corpus, consisting of 6434 documents labelled with document genres, is expected to become a major research resource among text processing and data and information management researchers. In particular, we encourage the use of the corpus for the research of: - Automated Text Classification (TC) - Digital curation and metadata extraction - Natural Language Processing (NLP) - Computational Linguistics (CL) Despite the potential of document genre classification as a supporting step in language processing, document management, and information retrieval (e.g. the linguistic style and the vocabulary of a document varies distinctively across document genres), to date, there has been a severe lack of genre-labelled document corpora with which researchers can experiment. It is, therefore, with great pleasure that the Humanities Advanced Technology and Information Institute (HATII) at the University of Glasgow and the Digital Curation Centre (DCC) makes the KRYS I Corpus available to researchers around the globe. The Corpus originated as part of the ongoing Semantic Metadata Extraction research at the Digital Curation Centre (http://www.dcc.ac.uk) and the HATII at the University of Glasgow (http://www.hatii.arts.gla.ac.uk). The metadata extraction research evolved into a study of automated genre classification, reflecting the observation that the genre of a document (e.g. whether a document is a scientific article or a letter) is characterised by the form and structure of a document, the understanding of which would facilitate further extraction of metadata from within the document. Further details about the development of the KRYS I corpus are available via the website (http://www.krys-corpus.eu). Specifically, researchers will find a detailed account of the document collection process, the reclassification of the documents in the corpus, and the initial findings with regard to human classification of the documents. We encourage researchers to make full use of this corpus for their own research activity and recommend that you consider contributing towards the ongoing development of the corpus by adding your own documents to the database. Instructions as to how to contribute to the corpus are provided at http://www.krys-corpus.eu. Comments and/or feedback on the KRYS I Corpus are invited. Contacts details can be found on the website. Please feel free to distribute this announcement to any interested colleagues. -- Yunhyong Kim DCC Curation Resources Researcher Humanities Advanced Technology and Information Institute (HATII) University of Glasgow (charity number SC004401) Glasgow United Kingdom
Linguistic Field(s): Computational Linguistics
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|