LINGUIST List 4.815

Mon 11 Oct 1993

Jobs: Research fellowship at Leeds

Please display the following to linguists interested in visiting Leeds,
either temporarily or for the full 3-year project:


 Centre for Computer Analysis of Language And Speech (CCALAS)


The above SERC-funded post is available immediately for a fixed period of
three years to work on a project in natural language processing, involving
mapping between the syntactic annotation schemes of different ragged and
parsed corpora, including LOB, Brown, London-Lund, UPenn, SEC, ICE, British
National Corpus.

A PhD or equivalent expertise in Linguistics, Computer Science or Artificial
Intelligence is required; experience of corpus-based computational linguistics
and the syntactic models of one or more of these corpora is preferred.

Salary will be on the scale for Research Staff Grade IA (#12,828 - #20,442)
according to qualifications and relevant experience.

Informal enquiries about the post may be made to Eric Atwell, tel 0532 335761,
fax 0532 335468, email or Clive Souter, tel 0532 335460,

Application forms and further particulars may be obtained from the Personnel
Office (Academic Section), The University, Leeds LS2 9JT, England, tel 0532
335771 quoting reference no 48/105.

Closing date for applications: November 1st 1993.

The University of Leeds promotes an equal opportunities policy



SERC may also fund one or more Visiting Fellowships to support leading
researchers from other Institutions who can contribute towards the project,
visiting Leeds University for between a month and a year. We would
particularly welcome researchers with in-depth experience of one or more of
the tagging and/or parsing schemes, to advise us in the creation of the
detailed mapping algorithms, and the Multi-Tagged Corpus and MultiTreebank. If
you are interested in visiting CCALAS as a project advisor, please contact
Eric Atwell ( and/or Clive Souter (


 PROJECT SUMMARY: Mapping Between Corpus Annotation Schemes

Several alternative tagged and parsed Corpora of English exist, including LOB,
Brown, London-Lund, UPenn, SEC, ICE, British National Corpus, each with its
own tagset and/or parsing scheme. A tagged or parsed Corpus has many
applications, such as training linguistic constraint models for improved
speech recognition; however users cannot combine Corpus training sets into a
single language model, as the annotation schemes are incompatible.

This project will design a set of tag- and tree-transducers or algorithms for
mapping between the main corpus annotation schemes. This will allow users of
one Corpus to view other Corpora as enlargements of their training set. One
tagset and parsing scheme will be our 'base' or interlingua, and transducers
will be built between this interlingua and the other annotation schemes. A
relatively small test corpus will be annotated with all the schemes under
consideration; we will investigate the use of the resulting Multi-tagged
Corpus and Multitreebank as a standard evaluation benchmark for taggers and



The School of Computer Studies at Leeds provides excellent broad background
support for research; we were graded 4(A) by UFC/HEFCs, and NLP makes an
important and growing contribution to the School's research profile. At Leeds
the Centre for Computer Analysis of Language And Speech (CCALAS), with Eric
Atwell as Director and Peter Roach and Clive Souter as Deputy Directors,
provides a focus for a broad range of corpus- and dictionary-based research
including word-sense semantic disambiguation and tagging (Demetriou, Jost,
Atwell), grammar-based reasoning (Mott, Silver), speech act theory (Holdcroft,
Wallis, Wynne, Millican), probabilistic parsing (Pocock, O'Donoghue, Atwell,
Souter, Hogg), corpus collocation analysis (Howarth, Cowie, Davidson), corpus
annotation (Atwell, Roach, Souter, Arnfield, Ghali, Bull), grammatical
inference and clustering (Hughes, Tarver, Atwell) speech recognition (Roach,
Ueberla, Kirby, Moore, Lockhart, Mair, Sergant), speech synthesis (Scully,
Roach), handwriting recognition (Hanlon, Boyle, Bushofa), text generation
(Cole, Grierson, Tawalbeh), human-computer interaction (Crow), computers in
language teaching and linguistics (Davidson, Fox, Roach, Hunter, Shivtiel),
computers in lexicography (Roach, Setter, Cowie, Atwell, Souter).


Leeds University has over 15,000 students and 2,000 academic and research
staff, making it one of the largest in Britain. Leeds is half-way between
London and Edinburgh, linked by rail, motorway and air to the rest of the UK
and Europe. It is the 20th largest city in the European Community, with the
excellent arts, sport and other social facilites expected of a growing,
multi-cultural metropolis; but it is also close to four National Parks. More
background information on the Project, CCALAS, the University, and Leeds and
its environs can be found in the Further Particulars from the Personnel
