PRIMA FACIE
  • Prima Facie
  • Evidence
  • Shakespeare
  • Oxford
  • False Grails
  • Other Candidates
  • Resources
  1. Oxford
  2. Frequency Analysis
  • Prima Facie
    • An End to Doubt
    • Prima Facie Evidence
    • A Prima Facie Case for Shakespeare
    • AI response to PFC
    • The Role of Evidence
    • What is a Prima Facie Case?
    • Brief History
  • Evidence
    • The Use and Abuse of Evidence
    • Spurious Correlation
    • Droueshout
    • Beyond Belief
    • Why is there a ‘debate’?
    • The Unorthodox Logic of Diana Price
    • William Basse
    • Leonard Digges
    • Lifetime references
  • Shakespeare
    • Shakespeare of Stratford
    • Shakespeare’s Education
    • Shakespeare’s Handwriting
    • Shakespeare’s Monument
    • Shakespeare’s Biography
    • Shakespeare in Italy
    • Shakespeare Side by Side
  • Oxford
    • The 17th Earl of Oxford
    • J. Thomas Looney
    • Brave New Avon
    • Oxford’s Higher Education
    • Oxford’s Hand
    • Oxford’s Geneva Bible
    • The Swan of Hedingham
    • Six Sonnets
    • Frequency Analysis
    • Oxford’s poetry
    • Oxford’s Correspondence
      • The De Vere Letters
      • Armada Letters
      • Italy Correspondence
      • Personal Letters
      • Tin Trade
      • Tin Memoranda
      • Wardship Papers
      • Oxford’s Vocabulary
  • False Grails
    • False Grails
    • Wracke and Redemption
    • A Squadron of Tempests
    • The Mysterious Number 17
    • Off-wavelength Frequency
    • Computer Assisted Attribution
    • Plane truth
  • Other Candidates
    • Articles in this Section
    • Christopher Marlowe
    • Cervantes
    • Emilia Bassano Lanier
    • John Florio
    • The Top 50
    • The Full List of Shame
  • Resources
    • Resources
    • Site plan
    • Downloads

On this page

  • Practical criticism
  • A simpler approach
  • EEBO v3
  • Alliteration
  • diy eebo
  • Shakespeare’s poems
  • Shakespeare’s First Folio
  • Contemporary dramatists
  • Edward de Vere

Frequency Analysis

New ways to deconstruct De Vere digitally deprive Doubters of dispositive data

Stylometry
Literary Analysis
Big Data
EEBO V3
Nelson
Alliteration
Part of Speech
Lancaster University

The University of Lancaster’s Bankside repository is a new tool (launched in 2021) to provide curated corpora for detailed linguistic analysis of Bankside plays. It’s interface to EEBO V3, the first online Early English Books Online database with part of speech metadata,allows detailed examination of 45,000 printed books, every word knowing its part of speech and its place in the repository. The more exciting work, however,for Shakespeareans, lies the detailed metadata preparation and tagging of Bankside theatre and Shakespeare’s canonical work.

Since the 1980s, the growth in the availability of digital texts has enabled academics and students to explore the language of Shakespeare and his contemporaries using machines to analyse increasingly large numbers of texts to the point that we now suspect that with their lightning speeds and unimaginable amounts of data, computers might be getting better at it then we are.

How has all this affected the Shakespeare authorship question? There is an endemic (and healthy) scepticism of the results of stylometric techniques which often takes the form of stiff resistance to results contradicting long held beliefs. However, it has been a disaster for the makeshift usurpers and their followers. By any measure (and we will demonstrate), Oxford is far from Shakespeare, every modern metric confirming what Shakespeare scholars have always insisted, that high school children should be able to detect the unbridgeable gulf in quality described by Professor Steven May.1

Recently, the University of Lancaster fitted a hand-keyed corpus of Shakespeare’s works with part of speech tags then followed that up with a comparative corpus of other Bankside writers, including the work of the once favoured alternative, Christopher Marlowe. The most advanced stylometrists today are busy using advanced techniques aimed at separating what Marlowe contributed to Shakespeare’s early work with mathematical tests able to distinguish their work.

Using computers to separate the work of the two Bankside playwrights closest in style and popularity has cast Marlovians into darkness which may turn out to be eternal. Marlowe and Shakespeare are not the same person, not twins, not even cousins but two distinctive neighbours whose work can be separated by complex algorithms. When it comes to separating Oxford from Shakespeare, however, an abacus, two pencils and a few envelopes should be be than enough for anyone.

Part of Speech (PPM) Shakespeare Marlowe De Vere Variance (S vs DV)
Nouns 189,265 +2 194,439 330,513 +2 +43%
Verbs 179,213 181,302 100,008 -79%
Adjectives 71,736 44,781 54,858 -31%
Adverbs 54,593 +2 55,078 45,925 +2 -19%
Pronouns 69,618 +1 84,468 78,789 12%
Determiners 34,294 +2 30,552 59,335 +2 42% +2
Conjunctions 64,325 62,756 47,846 -34%
Prepositions 76,207 72,231 82,177 7%
Possessives 14,106 16,110 13,959 -1%
1st Person 22,246 20,424 10,531 -111%

This table is based on De Vere’s lifetime output including his letters, which make up a corpus of around 56,000 words. Shakespeare’s two long poems, the sonnets, the Passionate Pilgrim content with the Lover’s Complaint produce a similar sized corpus of around 58,000 words and Marlowe’s six plays,somewhat larger at around 120,000 words feature together here in an analysis of part of speech. Work that would have taken scholars months can now be done in a few minutes. The figures are calibrated in usage per million words for comparability. The variance column shows the percentage difference between Shakespeare and De Vere, with a positive value indicating that De Vere uses more of that part of speech than Shakespeare, and a negative value indicating that he uses less.

Applying complex algorithms to artistic endeavour, with big data and high stakes, has resulted fierce argument but two things have become abundantly clear. Professional playwrights are on one side of a line and with few exceptions, court poets, amateurs of every kind, and all wildcat contenders are on the other side. You can see roughly where that line is drawn in the table above.

Practical criticism

Late 18c and early 19c students of Shakespeare like Johnson, Pope and later Keats and Shelley believed that the mechanics of the English language could be studied, like the mechanics of a clock. The mainspring of the English language, what drove it, were verbs foremost followed by their descriptive sidekicks, adverbs.

The hare limped trembling through the frozen grass 2

The line between Marlowe and Shakespeare may be wavy, Shakespeare uses many more adjectives, but they are on the same side with very similar usage patterns when it comes to verbs and adverbs. In the engine room of the English language, De Vere uses 80% fewer verbs than Shakespeare and his work is 43% heavier with nouns. These are not small differences. EEBO can show far greater disparities between the two authors with a little more work but looking at the wood, rather than the veins on one of the leaves, there are differences that have be explained.

An as yet unreleased algorithmic classifier built for the simple distinction Shakespeare/Not Shakespeare rates Oxford’s poetic corpus as “Not Shakespeare” with this rather gnostic diagnosis “Stylometric warning: this passage produces extreme z-scores on 1.7% of features so the verdict may be an extrapolation rather than a confident classification. Treat as suggestive, not definitive. Why the model called it not-Shakespeare: longer-than-typical lines; frequent trigram ⟨ne·⟩; low vocabulary diversity (lots of repetition); frequent trigram ⟨to·⟩ m: many long lines (prose-like); few one-off words.”

We may see this classifier on this site later this year once its training corpus is built out. It does very well with Shakespeare, Marlowe, Peele and Wilkins and pulls all of the disputed chunks but there’ll be no public appearance while it tries to advance the cause of Gammer Gurton’s Needle. But yet again the message is that you don’t need sophistication to separate Oxford from Shakespeare. Algorithmic stylometry from untested sources is weightless in the attribution debate but nevertheless, a key part of the verdict, low vocabulary diversity (lots of repetition)” is visible to the naked eye. The Oxford corpus, following the De Vere Society’s compendium of canonical work contains Peascod time* a poem by George Gascoigne. Removing that halved Oxford’s very slim chances of being Shakespeare. Adding Oxford’s prose to his corpus makes them disappear altogether.

We are experimenting with stylometric algorithms and will have a classifier usable in a browser later in the year

A simpler approach

And yet I languish in great thirst
while others drink the wine.
Thus like a woeful wight wove my web of woe;
The more I would weed out my cares,
the more they seem to grow.

Care and Disappointment E. Ox.

Drown me you trickling tears,
you wailful wights of woe;
Come help these hands to rent my hairs,
my rueful haps to show;

The Forsaken Man, Edward De Vere

The first sample here reads more like a challenge for starting words with the letter ‘w’ than a genuine sigh of complaint, nine examples in three lines. Any college student should be equipped to spot this isn’t Shakespeare. It isn’t Oxford’s best work—he’s capable of better but he can’t aspire to being Shakespeare or even Wyatt. Most, if not all the people he hired into his service, Munday, Lyly, Churchyard, can write better than this.

Fram’d in the front of forlorn hope past all recovery,
I stayless stand, to abide the shock of shame and infamy.
My life, through ling’ring long, is lodg’d in lair of loathsome ways;
My death delay’d to keep from life the harm of hapless days.
My sprites, my heart, my wit and force, in deep distress are drown’d;
The only loss of my good name is of these griefs the ground.

And since my mind, my wit, my head, my voice and tongue are weak,
To utter, move, devise, conceive, sound forth, declare and speak,
Such piercing plaints as answer might, or would my woeful case,
Help crave I must, and crave I will, with tears upon my face,
Of all that may in heaven or hell, in earth or air be found,
To wail with me this loss of mine, as of these griefs the ground.

Help ye that are aye wont to wail, ye howling hounds of hell;
Help man, help beasts, help birds and worms, that on the earth do toil;
Help fish, help fowl, that flock and feed upon the salt sea soil,
Help echo that in air doth flee, shrill voices to resound,
To wail with me this loss of mine, as of these griefs the ground.

E. Ox 1550 - 1604

They flee from me that sometime did me seek
With naked foot, stalking in my chamber.
I have seen them gentle, tame, and meek,
That now are wild and do not remember
That sometime they put themself in danger
To take bread at my hand; and now they range,
Busily seeking with a continual change.

Thanked be fortune it hath been otherwise
Twenty times better; but once in special,
In thin array after a pleasant guise,
When her loose gown from her shoulders did fall,
And she me caught in her arms long and small;
Therewithall sweetly did me kiss
And softly said, “Dear heart, how like you this?”

It was no dream: I lay broad waking.
But all is turned thorough my gentleness
Into a strange fashion of forsaking;
And I have leave to go of her goodness,
And she also, to use newfangleness.
But since that I so kindly am served
I would fain know what she hath deserved.

Sir Thomas Wyatt 1503 - 1542

If you didn’t follow the link to Richards, this is the first poem on the course. Wyatt isn’t a contemporary of De Vere but despite dying eight years before De Vere was born his work sounds later, more agile in its verse, more varied in its metre, his grief more personal, more individually expressed.

Using Professor Nelson’s work, we built a corpus for Oxford’s work for EEBO V3 analysis, Three in fact. All the poems, all the other written work and a combined corpus with everything, used to create the table above. Its Oxfraud creators are still, five years later, the only people to have used it. Not even Oxfordians are curious to see how his writing matches up to his contemporaries. You shouldn’t need to use sophisticated methods to separate Oxford’s work from Shakespeare and when you do the results make it clear why not.

EEBO v3

We used EEBO v3 to extract haggard hawks 3 to rebut the idea that the phrase is an Oxford marker. But the new version lets you search for haggard as both noun and adjective and pair it with other birds or other nouns, exploring collocated words or parts of speech, categorising your search–in the case of Shakespeare’s work–to plays, genres, even individual characters or characters grouped by their age or social status. You can now analyse the difference between aristocrats, the middle class and the groundlings to see the variations Shakespeare used in vocabulary and different parts of speech.

This is a new field of study offering new analytical horizons, the only forseeable benefit to Doubters being their final enlightenment.

Alliteration

If you’ve read Oxford’s poetry you will have been struck by his use of clangorous alliterative lists of nouns and adjectives. Note the casesura here, which splits each line in equal measure.

My life, through ling’ring long, is lodg’d in lair of loathsome ways;
My death delayed to keep from life, the harm of hapless days.

Now this really doesn’t sound like Shakespeare at all, except when he is satirically punishing over-reaching artistic pretensions.

But stay: O spite! But Mark, poor knight, What dreadful dole is here?
Eyes, do you see? How can it be? O dainty duck, O dear.

Here’s George Chapman doing fourteeners some justice in his translation of the Iliad, which enchanted Keats into writing oneof the greatest sonnets in the language.

What God gave Eris their command, and op’t that fighting veine?
Jove’s and Latona’s Sonne, who, fir’d against the king of men
For contumelie showne his Priest, infectious sickness sent
To plague the armie; and to death, by troopes, the souldiers went.

There are more instances, however, of Shakespearean alliterative multiples than you might at first think from his playful mockery.

diy eebo

Rather than write another lecture on stylometry, you can check out EEBO for yourself without leaving this page. These tables extract alliterative pairs, an adjective and a noun, and pull out any other alliterative content nearby. There are four corpora in use, three available as standard at The University of Lancaster’s implementation of CQP.4 The fourth is an Oxford corpus built by ourselves which can be uploaded and used in a Lancaster CQP account and can be downloaded from our Resource page. It’s only any use for research purposes as now it has been tagged it reads like this.

The_DT labouring_VBG man_NN that_WDT tills_VBZ the_DT fertile_JJ soil_NN And_CC reaps_VBZ the_DT harvest_NN fruit_NN hath_VBZ not_RB indeed_RB The_DT gain_NN ,, but_CC pain_NN ,, and_CC if_IN for_IN all_PDT his_PP$ toil_NN He_PP gets_VBZ the_DT straw_NN ,_, the_DT lord_NN will_MD have_VB the_DT seed_NN ._SENT

The three tables below cover his long poems and sonnets, the First Folio plays, and a selection of contemporary dramatists for comparison. Oxford comes last — his entire surviving poetic output, machine-tagged from a single corpus. Spend s few minutes with the Shakespeare tables first.

Our results tables can be sorted, filtered an searched. If you search the First Folio table for “Midsummer” you can compare Bottom’s alliterative fourteeners to Oxford’s. If you compare the alliteration in Macbeth to that in Much Ado you will see how far Shakespeare adapts different writing styles and vocabulary to different themes, characters and circumstances. Nobody seriously believes Benedict is going to kill Claudio but the language itself tells us what fate lies in store for Macbeth before he gets home in Act I.

By sorting alphabetically you can see, the same alliterative pairing used differently in Venus and Adonis and Lucrece, or see how Marlowe and Jonson handle the same device.

There are over 2,000 examples in the tables so it helps, before you jump in, to consider what you want to get out. There is some dispute over canonical work in all these writers, Professor May, The De Vere Society and leading Oxfordians have varying views on what should be included in an Oxford corpus. Two, Michael Brame & Galina Popova in Shakespeare’s Fingerprints,5 credit Oxford with writing almost everything of significance in the 16th century. Their work on “veronyms” (De Vere markers in the King James Bible written by De Vere, of course) beggars belief.

Shakespeare’s poems

Shakespeare’s First Folio

Contemporary dramatists


Edward de Vere

The table below covers de Vere’s entire surviving poetic output — the 20 poems accepted as canonical by the De Vere Society — together with his letter corpus. The poems were written largely in his teens and twenties; the letters span his adult life. Unlike the Shakespeare tables above, which are drawn from hand-keyed, editorially verified corpora, the Oxford corpus was processed by treetagging software, so the data is less refined. The difference in volume is not an artefact of the method. Neither is the difference in quality.

79 instances in Shakespeare’s poems and sonnets  |  901 in the First Folio  |  1037 across 21 contemporary dramatists  |  34 in the complete De Vere corpus (18 from poems, 16 from letters)

Footnotes

  1. “Tennessee Law Review Vol 72 Iss 1,” https://ir.law.utk.edu/tennesseelawreview/vol72/iss1/, April 2004, pp. 221–ff “As I worked on my edition of the Earl of Oxford’s poetry during the 1970s, I hoped, as I still do, that I might find some connection between De Vere’s work and the writings, any writing, of William Shakespeare. Unfortunately, I discovered instead a gulf between the two poets’ styles that rules out any direct ties between their output. I looked further into De Vere’s life as I prepared my book, The Elizabethan Courtier Poets. The facts of his biography and career at court made any connection with Shakespeare or his writings even less likely. I regret these enforced conclusions, however, because no one has more to gain than I from discovery of persuasive evidence linking Shakespeare’s works with Oxford. That discovery would catapult me from my obscure role as a professor of English at Georgetown College to the exalted status of a pioneering editor of the poems of”Shakespeare.↩︎

  2. St Agnes Eve, John Keats↩︎

  3. A table of the appearance of haggard hawks in EEBO V3 features here in our article about stylometry.↩︎

  4. The University of Lancaster’s Bankside repository provides curated corpora for detailed linguistic analysis to much larger language models than the Bankside corpora. It’s Bankside theatre section is demonstrational rather than central to its ultimate purpose. Access is by permission from the university.↩︎

  5. Michael Brame and Galina Popova, Shakespeare’s Fingerprints, (Adonis Ed: Vashon Island, Wash., 2002).↩︎

Six Sonnets
Oxford’s poetry