Sum: Corpus Analysis of Hypertext

Almost a year ago I've posted a request to this list asking people to
mail me the URL of their homepages in order to create a corpus of
manually authored HTML files.

The results of the corpus analysis are found in my MSc dissertation
( and you are welcome to have a look
at them. I think many of you who work with hypertext might find this
study interesting and I'll appreciate any comments since we are
about to publish the results.

The abstract is below and thank you very much for your help!

Einat Amitay

- ----------------------------------------------------------------

Dillon et al. (1993) observed, when the hypertext authoring on the web
was just beginning to become popular in the non-academic world, that
there is a problem of schemata, or genre conception, in hypertext,
because of the flexible nature of language and the varied layout used
in its creation. Today, almost five years later, the web is used by
many people and there are conventions which evolved from usage and
experience. In the years that passed since then, users became aware of
the existence of other users by interacting with their hypertext
documents and by creating their own homepages. Through analysing two
corpora consisting 1000 HTML files retrieved from the World Wide Web,
this study describes the linguistic conventions with which hypertext
documents are being written. It is claimed here that hypertext is a
new linguistic genre and that it should be treated as such in future
studies. It is also suggested in this dissertation that studying these
conventions and applying the gained knowledge to existing academic
work, would be beneficial to both hypertext users and the research
