A range of electronic corpora has become increasingly accessible via the
WWW and CD-ROM. This development has coincided with improvements in the
standards governing the collecting, encoding and archiving of such data.
Less attention, however, has been paid to making other types of digital
data available - especially that which one might describe as
'unconventional', namely, dialects, child language and bilingual databases.
Advances in technology have enabled the collection and organisation of such
data sets into a growing number of user-friendly electronic corpora. The
latter have the potential to offer new insights into linguistic universals,
for instance, since they allow, for the first time, rapid and systematic
comparisons between first and second language/dialects across both social
and geographical space. This book provides state-of-the-art methods and
guidelines for creating and digitising these resources taking full
advantage of the dramatic recent improvements in computing and analytical