Online Early Modern Texts
Putting Early Modern Literature Online

The development of online resources and what is available for research into the Early Modern Period today.
| Resource | Dates | What it contains | Text type | Organisation & access | Key limitations |
|---|---|---|---|---|---|
| LION (Literature Online) | Launched 1997 (Chadwyck-Healey); migrated to ProQuest 2019 | 350,000+ works of poetry, drama & prose in English, 8th century to present. Canonical and semi-canonical authors. Shakespeare, Spenser, Jonson etc. Scholarly journals and ABELL index also included. | Re-keyed full text. Transcribed from first editions or scholarly editions; 99.995%+ accuracy claimed. | Subscription (institutional). Searchable via ProQuest platform. Boolean & proximity search. Browsable by author, genre, and period. | Selective canonical coverage only. No POS tags or linguistic metadata. Commercial subscription required. Not bulk-downloadable for text mining. |
| EEBO (Early English Books Online) | Microfilming from 1938; online 1998 (UMI/Chadwyck-Healey); ProQuest from c.2003 | 146,000+ titles, 1473–1700. Covers STC I (Pollard & Redgrave), STC II (Wing), Thomason Tracts, Tract Supplement. 17 million+ pages. Virtually all surviving print in English to 1700. | Page images. Bitonal scans from microfilm (greyscale from 2012). PDF & TIFF. Images only — no searchable text unless a TCP transcription exists for that title. | Subscription (ProQuest). Browse and search by ESTC metadata (author, title, date, STC number). Full-text search only available for TCP-transcribed subset. Images not freely reusable. | Images are not text. Microfilm artefacts (bleed-through, damage). Black-and-white scanning distorts typeface detail. Coverage approximately 92% complete. Cannot be computationally processed without the TCP layer. |
| TCP Phase 1 (EEBO-TCP Phase I) | Transcription 2000–2009; public release 1 January 2015 | 25,368 texts selected from EEBO. Selection biased towards New Cambridge Bibliography authors, then thematic and format batches. Coverage c.1475–1700. | Hand-keyed XML. TEI P5 XML with structural markup (headings, verse, notes, figures). No POS tags or lemmatisation. Spelling is original and unregularised. | Freely downloadable from TCP GitHub, Michigan, and Oxford Text Archive. Bulk XML. Searchable via ProQuest EEBO or Michigan interface. No POS query syntax. | Canonical selection bias. Original spelling makes linguistic search inconsistent across the corpus. No POS or lemma data. Some transcription errors. Fixed snapshot — not updated. |
| TCP Phase 2 (EEBO-TCP Phase II) | Transcription 2009 onwards; public release January–August 2020 | ~35,000 additional texts from EEBO (combined total with Phase 1: ~60,000 texts). Broader coverage with more emphasis on English-language text. Completes the TCP transcription project. | Hand-keyed XML. Same TEI P5 XML format as Phase 1. No POS tags, no lemmatisation, original spelling throughout. | Now fully public. Bulk download from TCP GitHub. Integrated into ProQuest EEBO interface alongside Phase 1. Also accessible via EarlyPrint and CQPweb corpora. | Same limitations as Phase 1. Approximately 85,000 EEBO titles remain without any transcription. Ongoing corrections are community-driven. |
| EEBO V3 / CQPweb (Lancaster University / UCREL annotated corpus) | Built on TCP Phases 1 & 2; annotated version mounted on CQPweb; current version c.2015–ongoing | 44,422 texts; 1.2 billion running tokens. Both TCP phases processed through Lancaster’s UCREL annotation pipeline. Spelling regularisation applied first, then POS tagging and lemmatisation. Available on the same CQPweb server as Lancaster’s hand-keyed and linguistically detailed Shakespeare corpus, which can be filtered by play, character, scene, and experimental filters including the social class of the speaking character. | POS-tagged corpus with lemmatisation. Eight annotation fields per token: original form, regularised spelling, lemma, POS tag, and further linguistic metadata. CQP (Corpus Query Processor) syntax enables grammatical pattern search across the full corpus. | Accessed via Lancaster CQPweb (cqpweb.lancs.ac.uk/eebov3). Free but requires account registration. CQP query language. KWIC concordance, frequency breakdown by date, and collocation tools. Fixed corpus — not updated in real time. | POS tagger trained on modern English; accuracy is reduced for early modern syntax and morphology. No metadata filtering by author or title within the CQP interface. Fixed snapshot. Spelling regularisation introduces editorial decisions. |