Search for a command to run...
Abstract Burrows' Delta Method (Burrows, 2002 Burrows, J. 2002. ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3): 267–287. [Crossref] , [Google Scholar]) is a leading method of authorship attribution. It can be used to shortlist potential authors from a list or to even identify potential authors. The technique has been extended by Hoover (2004a Hoover, D. 2004a. Testing Burrows' Delta. Literary and Linguistic Computing, 19(4): 453–475. [Crossref] , [Google Scholar], 2006 Hoover, D. 2006. “Word frequency, statistical stylistics and authorship attribution. Word frequency and keyword extraction”. In AHRC ICT Methods Network Expert Systems Seminar on Linguistics, Lancaster University. [Google Scholar]). In this investigation, we look at the choice of words for the word vector used, the size of the word vector, the similarity measure and the impact of corpus choice on the accuracy of text classification. Our results show a word frequency vector of between 200 and 300 words give the most accurate results (Aldridge, 2007 Aldridge, W. 2007. The Burrows Delta Dilemma: Optimization of Delta for Authorship Attribution, London: City University. MSc thesis [Google Scholar]). We also demonstrate a dramatic improvement in accuracy by adapting Burrows' Delta to the cosine similarity measure. Additionally, our results indicate areas where the word vector can be optimized still further for more accurate results.
Published in: Journal of Quantitative Linguistics
Volume 18, Issue 1, pp. 63-88