Parts of Speech
Big data learns where gerunds are used

Since the 1980s, the growth in the availability of digital texts has enabled academics and students to explore the language of Shakespeare and his contemporaries using machines to do analyse increasingly large numbers of texts and languages until we now suspect that with their vast amounts of access and data, computers might be getting better at it then we are.
First came digitised texts, scanned from books. These made concordances for corpora like the Shakespearean Canon, then extended in all directions as the tools proved their worth. The Early English Books Online (EEBO) project has digitized over 125,000 texts from the early modern period, providing a vast corpus for linguistic analysis. These texts gave birth to a nre type of analysis, computerised stylometry, which uses statistical methods to attribute authorship and analyze stylistic features. In the early 1990s the whole Shakespearean canon counted as a large amount of data for analytic operation with the hardware of the time. Today, complex operations on the whole of EEBO take seconds and large language models use amounts of processing power and storage that are effectively infinite.
How has all this affected the Shakespeare authorship question? There is an endemic (and healthy) scepticism of the results of stylometric analysis which often takes the form of stiff resistance to results which contradict long held beliefs. All in all, however, it has been a disaster for the makeshift usurpers and their followers. By any measure (and we will demonstrate) Oxford is far from Shakespeare and every modern metric confirms what Shakespeare scholars have always insisted, that high school children should be able to detect an unbridgeable gulf in quality and most of the pretenders.
Recently, the University of Lancaster fitted a hand-keyed corpus of Shakespeare’s works with part of speech tags, and followed that up with a comparative corpus of other Bankside writers, including the once favoured alternative, Christopher Marlowe. The most advance stylometrists are busy separating what Marlowe contributed to Shakespeare’s early work with mathematical tests that can separate the two authors.
| Part of Speech (PPM) | Shakespeare | Marlowe | De Vere | Variance (S vs DV) |
|---|---|---|---|---|
| Nouns | 189,265 +2 | 194,439 | 330,513 +2 | +43% |
| Verbs | 179,213 | 181,302 | 100,008 | -79% |
| Adjectives | 71,736 | 44,781 | 54,858 | -31% |
| Adverbs | 54,593 +2 | 55,078 | 45,925 +2 | -19% |
| Pronouns | 69,618 +1 | 84,468 | 78,789 | 12% |
| Determiners | 34,294 +2 | 30,552 | 59,335 +2 | 42% +2 |
| Conjunctions | 64,325 | 62,756 | 47,846 | -34% |
| Prepositions | 76,207 | 72,231 | 82,177 | 7% |
| Possessives | 14,106 | 16,110 | 13,959 | -1% |
| 1st Person | 22,246 | 20,424 | 10,531 | -111% |
This table is based on De Vere’s lifetime output including his letters, which make up a corpus of around 56,000 words. Shakespeare’s two long poems, the sonnets, the Passionate Pilgrim content with the Lover’s Complaint produce a similar sized corpus of around 58,000 words and Marlowe’s six plays,somewhat larger at around 120,000 words feature together here in an analysis of part of speech. Work that would have taken scholars months can now be done in a few minutes. The figures are calibrated in usage per million words for comparability. The variance column shows the percentage difference between Shakespeare and De Vere, with a positive value indicating that De Vere uses more of that part of speech than Shakespeare, and a negative value indicating that he uses less.
Needless to say, as a result of applying complex algorithms to artistic endeavour, with big data and high stakes, fierce argument over results has broken out but two things have become abundantly clear. Professional playwrights are on one side of a line and with few exceptions, court poets, amateurs of every kind and wildcat contenders are on the other side. You can see roughly where that line is drawn above.
Late 18c and early 19c students of Shakespeare like Johnson, Pope and later Keats and Shelley believed that the English language was driven by verbs foremost followed by their descriptive sidekicks, adverbs.
The hare limped trembling through the frozen grass 1
The line between Marlowe and Shakespeare may be wavy, Shakespeare uses many more adjectives, but they are on the same side with very similar usage patterns when it comes to verbs and adverbs. In the engine room of the English language, De Vere uses 80% fewer verbs than Shakespeare and his work is 43% heavier with nouns. These are not small differences and EEBO can show far greater disparities between the two authors with a little more work.
Students trained in practical criticism have been known to be impatient with Oxfordians. When using the comparative methods of practical criticism, a discipline that most able readers can acquire online, they perceive an unbridgeable gulf between Oxford’s poetry and the verse of Bankside professionals. It cannot be explained in any way at all. It is not juvenilia. Great poets do not publish childish work in their 20s. They do not repeat mediocre or commonplace expressions.
And yet I languish in great thirst
while others drink the wine.
Thus like a woeful wight wove my web of woe;
The more I would weed out my cares,
the more they seem to grow.
Care and Disappointment E. Ox.
Drown me you trickling tears,
you wailful wights of woe;
Come help these hands to rent my hairs,
my rueful haps to show;
The Forsaken Man, Edward De Vere
There’s not really a great deal to be gained discussing Oxford’s poetry. The first sample reads like a challenge for starting words with the letter ‘w’, nine examples in three lines. Any schoolchild would be expected to be able to spot this isn’t Shakespeare. It isn’t Oxford’s best work, he’s capable of better but he can’t aspire to being Shakespeare or even Wyatt. Most, if not all the people he hired into his service, Munday, Lyly, Churchyard, can write better than this.
Using Professor Nelson’s work, we built a corpus for Oxford’s work for EEBO V3 analysis, Three in fact. All the poems, all the other written work and combined corpus with everything, used to create the table above. Its Oxfraud creators are still, five years later, the only people to have used it. Not even Oxfordians are curious to see how his writing matches up to his contemporaries. You shouldn’t need to use sophisticated methods to separate Oxford’s work from Shakespeare and when you do they make it clear why not.
Footnotes
St Agnes Eve, John Keats↩︎