Query for this summary posted in LINGUIST Issue:
A few weeks ago I broadcast a double query about the statistics of English vocabulary. My first question was about the number of morphemes compared with the number of lemmas, but nobody offered an answer.
My second question was more successful. This was about the proportion of lemmas in each of the main word classes, and how this proportion varied with token frequency; I was particularly keen to check a guess that the proportion of nouns was greater among rare lemmas than among common ones. I received data from Gwillim Law and Jasper Holmes. It turns out that my guess was right. I've presented and summarised the data at http://www.phon.ucl.ac.uk/home/dick/nouniness/nouniness.htm. If anyone has comments or further data (including data on other languages), I should of course be most interested to hear from them.
Text/Corpus Linguistics Read more issues|LINGUIST home page|Top of issue