LINGUIST List 5.1067

Sat 01 Oct 1994

Sum: Comparing texts for authorship

Editor for this issue: <>


Directory

  1. "William J. Rapaport", comparing texts for authorship -- summary

Message 1: comparing texts for authorship -- summary

Date: Thu, 29 Sep 1994 09:50:14 comparing texts for authorship -- summary
From: "William J. Rapaport" <rapaportcs.Buffalo.EDU>
Subject: comparing texts for authorship -- summary

Last June, I posted the following query about "comparing 2 texts":

A colleague in our Classics dept. wants to be able to compare 2 texts
to see if they were written by the same author, or by different authors.
Presumably, this would be done by some combination of a stylistic and a
statistical analysis.

(As I recall, this sort of technique has been used by folks who try to
figure out if Shakespeare really wrote Shakespeare's plays.)

What she needs are pointers to the literature, especially information
on how reliable such arguments are.

Appended is a summary of the replies. Thanks to all of you!

 William J. Rapaport
 Associate Professor of Computer Science
 and
 Center for Cognitive Science

Dept. of Computer Science | (716) 645-3180 x 112
SUNY Buffalo | fax: (716) 645-3464
Buffalo, NY 14260 | rapaportcs.buffalo.edu

 *************

 Date: Fri, 3 Jun 1994 14:18:46 --100
 From: Ken.Beesleyxerox.fr (Ken Beesley)

 Some important work on authorship was done by Michaelson & Morton in
 Edinburgh, Scotland.

 The Rev. A.Q. Morton
 The Abbey Manse
 Culross
 Dunfermline, Fife KY128JD

 Newmills 880-231

 Prof. S. Michaelson
 Computer Science
 JCMB
 Kings Buildings
 University of Edinburgh
 Edinburgh, Scotland

 There was also some work by a couple of statisticians at Brigham Young
 University in Provo, Utah, USA. Their names escape me now.
 *************
 Date: Fri, 3 Jun 1994 13:24:12 -0500
 From: hrubinstat.purdue.edu (Herman Rubin)

 Probably the best sound work from a statistical basis is

 Title: Inference and disputed authorship: The Federalist <by>
 Frederick Mosteller <and> David L. Wallace.
 Author(s): Mosteller, Frederick, 1916-
 Wallace, David L. (David Lee), 1928-
 Publisher: Reading, Mass., Addison-Wesley <1964>

 This book considered the Federalist papers, written by three known authors,
 but different ones by different authors. One of their conclusions was that
 analysis by context vocabulary and other similar things, much used by those
 who assigned authorship in the past, did not work; the only thing which did
 for this problem was the use of connectives.

 There is an article in the _Journal of Applied Probability_
 on the type-token relationship in Shakespeare's plays.
 A cursory glance at the data indicates that one cannot treat these as a
 sample from a single population, even if the comedies, tragedies,
 and historical plays are separated; there is a definite effect of the
 individual work. Similar things can be noticed in the attempts of other
 statisticians to do this, such as the writings of Yule.

 I did look at some of the data; I have not published on this. It is quite
 dangerous to say on the basis of a statistical test that two works are by
 different authors.
 -----------------

 Date: Mon, 6 Jun 94 11:16:55 -0700
 From: jtangcogsci.Berkeley.EDU (Joyce Tang Boyland)

 [Re: Mosteller and Wallace:]
 ...
 The 1984 edition [see reply below] has a much more informative table of
 contents than the original 1964 version published by Addison-Wesley.

 ------------------

 Date: Tue, 7 Jun 1994 10:33:39 --100
 From: Gregory.Grefenstettexerox.fr (Gregory Grefenstette)

 Frederick Mosteller and David L. Wallace

 "Applied Bayesian and Classical Inference: The case of the Federalist
 Papers, 2nd Edition of 'Inference and Disputed Authorship: The Federalist"
 (Springer-Verlag)

 this book gives statistical methods for deciding which of the
 Federalist papers were written by Hamilton and which were written by
 Jefferson.

 ------------------
 From: Robert.Sigleyvuw.ac.nz
 Date: Sat, 04 Jun 1994 13:02:58 +1200

 I've just finished reading (and returned, unfortunately) a collection of
 papers which I can thoroughly recommend to anyone trying to identify an
 author by style.

 It's called "Statistics and Style" and appeared in 1968. ... on checking
 the library online catalogue, I find that no extra information is given for
 it, so I won't be able to confirm this identification until it's reshelved
 (within 2 days, I hope). As far as I remember, the editor had a Slavic-type
 name beginning with `G'; a grep of the library's entire author index makes
 J. Gvozdanovic the most likely option (listed for another book in the
 general area of linguistics, but unconnected to the topic under
 discussion), but this id is tentative for now.

 Analyses covered include comparisons of
 (1) word-length spectra (and a number of statistics calculated from them);

 (2) sentence-length spectra (which are found to follow a log-normal
 distribution for any particular text by a single author. Author behaviour
 is reasonably consistent, but there is considerable overlap between
 different authors in the same genre);

 (3) use of certain vocabulary items previously identified as `typical' of
 the candidate author(s) on the basis of uncontested works;

 (4) use of certain grammatical constructions;

 (5) counts of certain grammatical classes (eg noun/verb ratios, or
 adjective/verb ratios).

 The final paper in the collection is perhaps the most important, as it deals
 with the general question of reliability.

 Overall, it has to be said that the crude general statistics above are not
 useful for deciding questions of authorship unless:

 (i) the number of possible candidate authors is small;
 (ii) we have a large body of work from each of the candidate authors;
 (iii) this corpus covers the entire span of their career (or at least shows
 little change over time); and
 (iv) the corpus is in similar genres to the contested item.

 In short, the rewards of such analysis are mostly not worth the
 considerable time it used to take to compute the statistics. The results
 are, I'm afraid, especially indecisive in answering questions in classics
 (where this volume of work, and the historical information about potential
 alternative authors, is often lacking).

 But if there is only one candidate author, with a large known corpus, and
 the exercise is simply to determine how similar to that author's style
 the unknown text is, then it can still be attempted.

 (1) and (2) above are now relatively quick and easy to calculate
 with most concordance programs - providing the text is in machine-readable
 form to start with! But they are the least author-specific methods.

 (4) and (5) could be useful, but are still very time-consuming to
 calculate, and require a whole lotta manual tagging of the texts. Best
 avoided.

 So (3) is probably going to be of most use in identifying a specific
 author. The best approach I can think of would be to construct a
 concordance (using OCP or similar) for a large corpus (20000 words minimum)
 of the candidate author, and then do the same for a similar-sized
 matched-genre corpus from the author's contemporaries. (If the text's
 general *date* is in doubt, you may as well give up now.)

 Then you compare the frequency ratios of common vocabulary items (ie
 frequency in candidate corpus/ frequency in mixed-contemporary corpus).

 This will identify a number of vocabulary items which are used
 proportionately much more or much less by the candidate author, and so can
 be used as `characteristic' of that author. Discard items which are linked
 in any literal sense to the text topic. Ignore very rare items (eg with
 frequency less than 5 over 20000 words). To save yourself time, and to
 maximise the sensitivity of your tests, look at only the 10 or so items with
 the largest differential frequency.

 Now calculate the frequencies of the remaining items in the contested text.
 Compare these with both the candidate-author and contemporary corpora
 frequencies.

 Finally, conduct a series of statistical tests to determine whether any
 differences you find can reasonably be attributed to chance. The best
 method will depend on the frequencies you get at the end of all this; ask a
 friendly statistician.

 Hope this helps. I'll mail back when I find the book again to confirm
 its identity. I should add though that there's been considerable progress
 in text manipulation on computers since its publication, so it's out of
 date in some areas; however, this is more or less made up for by falling
 interest and a lack of progress in statistical style analysis since 1970.

 ---------------------

 From: Robert.Sigleyvuw.ac.nz
 Date: Wed, 08 Jun 1994 16:34:09 +1200

 The reference I mentioned is actually:

 Lubomir Dolezel & Richard W. Bailey (eds) 1969. _Statistics_and_Style_. New
 York: American Elsevier Publishing Company.

 I shall try to give a brief description of the more important collected
 papers, with original references where possible.
 Page references for quotes are from the collection, though.

 Vocabulary Measures:

 Paul Bennett. The Statistical Measurement of a Stylistic Trait in
 _Julius_Caesar_ and _As_You_Like_It_. (from _Shakespeare_Quarterly_ VIII
 (1957): 33-50)
 Bennett applies Yule's characteristic (a measure of vocabulary
 repetitiveness) to two very different plays by Shakespeare. Using a
 card-sorting technique, this was very time-consuming; it would be much
 quicker today! He finds that the characteristic is a useful measure of
 style - it varies from act to act in a way predictable from the plays'
 structures - but "should not care to suggest that the characteristic is
 going to provide an infallible test of authorship" (p40).

 Charles Muller. Lexical Distribution Reconsidered: The Waring-Herdan
 Formula.
 (from _Cahiers_de_Lexicologie_ VI (1965): 35-53.)
 Muller tests a rather complicated formula designed to predict the
 word-frequency spectrum of a text. It works reasonably well on material
 from a variety of texts in several languages.
 [The shape of the frequency distribution is therefore of little use in
 author attribution. This formula has recently surfaced again in Baayen's
 (1990, 1991) work on morphological productivity.-RJS]

 Friederike Antosch. The Diagnosis of Literary Style with the Verb-Adjective
 Ratio. (translated from German original.)
 Antosch analyses a number of plays by Grillparzer, Goethe and
 Anzengruber, in terms of the verb/adjective ratio. She finds that this
 is extremely sensitive to elements of genre (eg dialogue/ monologue;
 and novels vs. academic writings) and characterisation (eg lower-class/
 upper-class). The V/A ratio may show local maxima within a play at
 points of rising action and climactic scenes, and so is a potentially
 useful stylistic indicator.
 [Corollary: it's of very limited use for comparing authors unless these
 factors can be controlled.]

 See also:
 G. Udny Yule. 1944. _The_Statistical_Study_of_Literary_Vocabulary_.
 Cambridge.

 Sentence-level Measures:

 C.B. Williams. A Note on the Statistical Analysis of Sentence-Length as a
 Criterion of Literary Style.
 Williams compares works by Chesterton, Wells and Shaw with respect to
 their sentence-length frequency spectra. He finds that these spectra
 are reasonably well modelled by a log-normal distribution (that is, the
 log of the sentence length has a normal distribution), and that the
 three books studied have significantly different mean sentence lengths
 - though the significance is marginal between Shaw and Chesterton.
 Williams uses samples of 600 sentences (approx 15000 words) from each
 book; this is a minimum sample size for work of this nature!
 [NB we can't conclude from this that we have identified any characteristic
 of the *authors*. -RJS]

 Kai Rander Buch. A Note on Sentence-Length as Random Variable.
 Buch comments on Williams' paper, presenting (with fearsome maths) a
 statistical analysis of two works by the same author, and concluding
 that the author's style has changed over time to such an extent that
 the texts are significantly different under Williams' test.

 See also:
 C.B. Williams. 1956. Studies in the History of Probability and Statistics
 IV. A
 Note on an Early Statistical Study of Literary Style. _Biometrika_ XLIII
 (1956): 248-256.
 G. Udny Yule. 1938. On Sentence-Length as a Statistical Characteristic of
 Style
 in Prose, with Application to Two Cases of Disputed Authorship. _Biometrika_
 XXX (1938-39): 363-390.

 [Hence gross sentence-length measures are of little use for author
 attributions: they can return non-significant differences between different
 authors, and significant differences between texts by the same author. They
 simply aren't specific enough. -RJS]

 Curtis W. Hayes. A Study in Prose Styles: Edward Gibbon and Ernest
 Hemingway.
 (from _Texas_Studies_in_Literature_and_Language_ VII (1966): 371-386.)
 Hayes avoids the above problem by taking a more detailed
 transformational analysis of passages of Gibbon & Hemingway. He finds a
 variety of grammatical patterns which show highly significant
 differences between the two authors - in particular, passives,
 doublets, infinitival nominals, and relative clauses are far commoner
 in Gibbon.
 [This is a valuable stylistic measure, though not a method I would have the
 patience to use myself! But it doesn't serve to identify the authors,
 so much as the very different genres they write in. -RJS]

 Studies of Individual Author Styles:

 John B. Carroll. Vectors of Prose Style.
 (from Thomas A. Sebeok (ed) 1960. _Style_In_Language_. MIT Press: 283-292.)
 This is an interesting use of factor analysis to determine the
 linguistic correlates of literary judgements.

 George M. Landon. The Quantification of Metaphoric Language in the Verse of
 Wilfred Owen.
 Least said the better.

 Frederick L.Burwick. Stylistic Continuity and Change in the Prose of Thomas
 Carlyle.
 The mutant offspring of an entropic study of 5-word wordclass
 sequences, and a more traditional literary analysis. The latter wins
 out, but is not easily applicable to other authors.

 Karl Kroeber. Perils of Quantification: The Exemplary Case of Jane Austen's
 _Emma_.
 Kroeber undertakes a detailed analysis of the vocabulary of Austen,
 Eliot, Dickens and [E.] Bronte. While many of the restrictions he
 places on his samples are arbitrary, this is potentially a useful
 direction for author comparison and attribution (see below).

 See also the case studies:
 Alvar Ellegard. 1962. _A_Statistical_Method_for_Determining_Authorship:_The_
 Junius_Letters,_1769-1772_. Gothenburg Studies in English 13.

 Ivor S. Francis. 1966. An Exposition of a Statistical Approach to the
 Federalist Dispute, in Jacob Leed (ed) _The_Computer_and_Literary_Style_.
 Ohio.

 Survey of the field:

 Richard W. Bailey. Statistics and Style: A Historical Survey. (pp217-236)
 This deals with the general question of reliability:
 "What is wanted... is a litmus test by which the critic can decide
 whether or not two given texts were written by the same author. Though
 some attempts have been made to formulate such a test, they have been
 almost wholly unsuccessful." (p222)

 Some other surveys cited by Bailey:
 William J. Paisley. 1964. Identifying the Unknown Communicator [...]
 _The_Journal_of_Communication_, XIV (1964): 219-237.

 Rebecca Posner. 1963. The Use and Abuse of Stylistic Statistics.
 _Archivum_Linguisticum_ XV (1963): 111-119.

 Overall, it has to be said that the crude general statistics above are not
 useful for deciding questions of authorship unless:

 (i) the number of possible candidate authors is small;
 (ii) we have a large body of work from each of the candidate authors;
 (iii) this corpus covers the entire span of their career (or at least shows
 little change over time); and
 (iv) the corpus is in similar genres to the contested item.

 In short, the rewards of such analysis are mostly not worth the
 considerable time it used to take to compute the statistics. The results
 are, I'm afraid, especially indecisive in answering questions in classics
 (where this volume of work, and the historical information about potential
 alternative authors, is often lacking).

 But if there is only one candidate author, with a large known corpus, and
 the exercise is simply to determine how similar to that author's style
 the unknown text is, then it can still be attempted.

 The general vocabulary and sentence-length measures above are now
 relatively quick and easy to calculate with most concordance programs -
 providing the text is in machine-readable form to start with! But they are
 the least author-specific methods. Possibly they could be of use as a
 preliminary check before plunging into more time-consuming methods.

 More specific grammatical analysis could be useful, but very time-consuming
 to calculate, requiring a whole lotta manual tagging of the texts. Best
 avoided.

 So an intermediate 'specific vocabulary' index is probably going to be of
 most use in identifying a specific author. The best approach I can think of
 would be to construct a concordance (using OCP or similar) for a large
 corpus (20000 words minimum) of the candidate author, and then do the same
 for a similar-sized matched-genre corpus from the author's contemporaries.
 (If the text's general *date* is in doubt, you may as well give up now.)

 Then you compare the frequency ratios of common vocabulary items (ie
 frequency in candidate corpus/ frequency in mixed-contemporary corpus).

 This will identify a number of vocabulary items which are used
 proportionately much more or much less by the candidate author, and so can
 be used as `characteristic' of that author. Discard items which are linked
 in any literal sense to the text topic. Ignore very rare items (eg with
 frequency less than 5 over 20000 words). To save yourself time, and to
 maximise the sensitivity of your tests, look at only the 10 or so items with
 the largest differential frequency.

 Now calculate the frequencies of the remaining items in the contested text.
 Compare these with both the candidate-author and contemporary corpora
 frequencies.

 Finally, conduct a series of statistical tests to determine whether any
 differences you find can reasonably be attributed to chance. The best
 method will depend on the frequencies you get at the end of all this; ask a
 friendly statistician.

 Hope this helps. I should add though that there's been considerable
 progress in text manipulation on computers since 1970, so it's out of
 date in some areas; however, this is more or less made up for by falling
 interest and a lack of progress in statistical style analysis since then.

 ----------------

 Date: Sat, 4 Jun 1994 11:51:03 -0600
 From: nostlercrl.nmsu.edu (Nick Ostler)

 Your colleague should look at a work by AJP Kenny on assessing the
 authorship of Aristotle's Eudemian Ethics: "The Aristotelian ethics:
 a study of the relationship..." Oxford: Clarendon Press, 1978.

 [also recommended by
 Virginia Knight <ZZAASVKcms.manchester-computing-centre.ac.uk>]

 ------------------

 Date: Mon, 6 Jun 94 13:58:20 +0200
 From: moniquegia.univ-mrs.fr (Monique Rolbert)

 Nous sommes une equipe BDD-LN qui developpons un langage d'interrogation
 de bases de donnees textuelles (a partir d'un format de type SGML) et
 une question est de savoir quel type d'operateur il est interessant de
 mettre a la disposition d'un utilisateur voulant faire du TALN sur des
 textes.
 Je serais tres interessee de connaitre vos types de besoins dans le genre de
 comparaison que vous voulez faire (statistique-stylistique)
 Merci d'avance.

 Monique Rolbert
 monique.rolbertgia.univ-mrs.fr

 -----------------
 Date: Mon, 6 Jun 1994 23:34:23 +1000
 From: sussexlingua.cltr.uq.oz.au (Prof. Roly Sussex)

 John Burrows (LCJFBcc.newcastle.edu.au) at the University of Newcastle,
 Australia, has done important work on text analysis and authorship.
 You could email him direct.

 Roly Sussex
 Director
 Centre for Language Teaching and Research
 and
 Language and Technology Centre of the National Languages and Literacy
 Institute of Australia
 University of Queensland
 Queensland 4072
 Australia

 email: sussexlingua.cltr.uq.oz.au
 phone: +61 7 365-6896 (work)
 +61 7 300-2942 (home)
 fax: +61 7 365-7077

 -------------------

 Date: Mon, 6 Jun 94 15:37:19 -0700
 From: edwardscogsci.Berkeley.EDU (Jane A. Edwards)

 Your query reminded me of a recent exchange regarding stylistic
 analysis, though in different context. Hope this is of use. -Jane Edwards
 | ------------------------
 | Date: Mon, 31 Jan 1994 11:59:00 -0500
 | From: neffwatson.ibm.com (Mary Neff)
 | To: FL-LISTBHAM.AC.UK
 | Cc: neffwatson.ibm.com
 | Subject: The Case of the Plagiarized Patent
 |
 | A few months back I was buttonholed at a party by the owner of a company
 in the middle of a patent infringement case. He wanted to know if, as a
 | linguist, I might have anything useful to offer. Not a lot, it's not my
 | field, but I just found this list, and one of YOU might. It seems that his
 | company had signed a contract with another one that included giving them
 | access to his design documentation and his patent applications. Some
 | time later, he discovered that the other company is siphoning off his
 | business and is making a product too similar to his to be accidental, and
 | has filed patents also (I think in other countries). His question to me
 was whether it were possible to study and compare the two patents by
 structure, language, etc. to determine whether there might have been
 any plagiarism involved. I later looked at the patents and decided that
 it was perhaps not a wild idea, but that any investigation would also
 have to take into account the general "formula" of a patent, which might
 account for a lot of similarity. Who are the experts on this sort of
 thing? What are some of the other issues involved? It's not so often
 that I get approached at a party for some free advice as a linguist;
 usually it's the doctors and the lawyers that encounter that sort of
 thing!
 |
 | Interestingly, I read something in this month's DISCOVER magazine that
 | mentions a couple of guys who designed a computer program to snoop for
 | plagiarism in books.

 ------------------

 Date: Tue, 01 Feb 94 10:03:54 EST
 From: Larry Horn <LHORNYaleVM.CIS.Yale.edu>

 Thanks for the postings. The lawyer has settled on one of my earlier
 respondents, Gerry McMenamin of Fresno, who wrote a book on authorship
 determination. Apparently computers are indeed much used in these matters,
 but I don't whether his samples (from his client and another man) are
 generous enough to allow for statistical significance.
 I guess McMenamin will help him decide.
 ------------------

 From: "Richard Hamilton-Williams" <RJHWregistry.cit.ac.nz>
 Date: Wed, 8 Jun 1994 13:29:26 GMT+1200

 Long ago, but not so far away, I studied Middle High German and wrote
 a bit of a thesis on the transmission of MHG texts.

 It wasn't very popular with a lot of people because it made little
 reference to "taste" and was based, rather, on a statistical analysis
 of variance between texts. My professor at the time got me interested
 in this and he in turn had got it from a book called, I think, "The
 Calculus of Variants". I've an idea it was written by E H
 Greig(Gregg?) in the 1920s or 1930s. In any case, I think I have a
 copy at home and will send you the details tomorrow.

 I made use of a fairly crude algorithm which establised a model as if
 the transmission of texts were known, and then measured actual
 variation against this. My professor died, nothing to do with me I
 hope, and although I completed my degree I went on to other things,
 so I can't claim that I know what goes on in the field nowadays. I
 imagine, however, that analysis is much more sophisticated now - I
 used punchcards to enter data on a mainframe - although the concepts
 should be very similar.
 Richard Hamilton-Williams
 Central Institute of Technology, Wellington, New Zealand
 04 527-6397 x6982
 Private Bag 39807
 Wellington Mail Centre
 New Zealand

 From: "Richard Hamilton-Williams" <RJHWregistry.cit.ac.nz>
 Date: Thu, 9 Jun 1994 08:03:41 GMT+1200

 The reference is:

 Greg, W. W. The Calculus of Variants, An Essay on Textual Criticism
 (Oxford, 1927)

 Greg wrote a number of other things and edited works on the basis of
 his theories on textual transmission.
 Richard Hamilton-Williams
 Central Institute of Technology, Wellington, New Zealand
 04 527-6397 x6982
 Private Bag 39807
 Wellington Mail Centre
 New Zealand

 ------------------

 From: h9290030hkuxa.hku.hk (R.Y.L. TANG)
 Subject: Authorship identification

 In David Crystal's _The Cambridge Encyclopedia of Language_ (Cambridge
 UP, 1987), there is a very succinct account of the use of statistics in
 stylistic analysis and authorship identification (Chapter 12).
 -----------

 From: Brett.Bakerlinguistics.su.edu.au (Brett Baker)
 Date: Wed, 15 Jun 1994 15:41:12 +1000

 ... I don't know if this will be much use to your colleague, but
 she could do worse than have a look at a new monograph by John Myhill called
 'Typological Discourse Analysis' published by Blackwell 1992. Apart from
 loads of interesting stuff about analysing texts quantitatively, it also has
 references for analyses that have been done on written texts which sound
 like the kind of thing you want. Much of the purpose of this kind of
 analysis is to show up regularities of expression type and
 stylistic/grammatical function. Good luck.
 -----------

 Date: Thu, 16 Jun 1994 00:24:45 -0500 (CDT)
 From: Kristin E Hiller <hill0087gold.tc.umn.edu>

 This is in response to the query you posted on Linguist (on behalf of
 your colleague). I'm sorry it's taken me so long to respond.
 Stylostatistical studies abound concerning cases of disputed authorship.
 You mention the Shakespeare/Marlowe controversy. I'll name a few others:

 1) One of the most often cited cases of disputed authorship is that of
 the _Federalist Papers_. Of the 88 papers, the authorship of twelve was
 in question (having been written by either Madison or Hamilton).

 2) Several anonymous articles appeared in the journals _Vremja_ (_Time_)
 and _`Epoxa_ (_Epoch_), which were both edited by Dostoevsky. Some of the
 artic;es have variously been attributed to Dostoevsky.

 3) The authorship of _The Junius Letters_, not known for certain, has
 often been attributed to Sir Francis Bacon (although some 40 others were
 considered at one time or another).

 4) Gustave Alderfeld's _The Military History of Charles XII_ was
 anonymously translated from the French. Henry Fielding is considered by
 some to have been the translator.

 5) Some scholars maintain that Sholoxov did not actually write all of
 _Tixij Don_, but plagiarized Krjuchkov's manuscripts.

 I have only recently begun reading about this field and have already
 come acrosss many refences to the work done on (1) by Frederick Mosteller
 and David Wallace _Inference and Disputed Authorship: "The Federalist"_
 (Reading, MA: Addison-Wesley, 1964) and a less statistic-laden work,
 Francis, Ivor S. "An Exposition of the Statistical Approach to the
 _Federalist_ Dispute," in _The Computer and Literary Style_, ed. Jacob
 Leed (Kent: Kent State U. Press, 1966).

 Geir Kjetsaa tackles (2) in his book (written in Russian) _Prinadlezhnost'
 Dostoevskomu: K voprosu ob atribucii F.M. Dostoevskomu anonimnyx statej v
 zhurnalax "Vremja" i "Epoxa"_ (Oslo: Solum Forlag, 1986).

 Michael and Jill Farringdon address (4) in "A computer-aided study of the
 prose style of Henry Fielding and its support for his translation of the
 Military History of Charles XII", in _Advances in Computer-aided Literary
 and Linguistic Research: Proceedings of the Fifth International Symposium
 on Computers in Literary and Linguistic Research_, D.E. Ager, F.E. Knowles,
 Joan Smith, eds. (Birmingham: AMLC, 1979).

 Rudall, B.H and T.N. Corns, _Computers and Literature: A practical guide_
 (Cambridge, MA: Abacus Press, 1987) contains a chapter on "Author
 identification and canonical investigation."

 With all the literature out there I could continue listing references
 until my fingers ached from typing. Instead I'll just list two more:

 Kenny, Anthony, _The Computation of Style: An introduction to statistics
 for students of literature and humanities_ (Oxford: Pergamon Press, 1982).
 A great book -- the title says it all.

 Feldman, Paula R. and Buford Norman. _The Wordworthy Computer: Classroom
 and research applications in language and literature_ (NY: Random House,
 c.1987). The best part of this book is its HUGE bibliography. A very
 good starting point.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue