Online Early Modern Texts

Putting Early Modern Literature Online

The development of online resources and what is available for research into the Early Modern Period today.

Resource	Dates	What it contains	Text type	Organisation & access	Key limitations
LION (Literature Online)	Launched 1997 (Chadwyck-Healey); migrated to ProQuest 2019	350,000+ works of poetry, drama & prose in English, 8th century to present. Canonical and semi-canonical authors. Shakespeare, Spenser, Jonson etc. Scholarly journals and ABELL index also included.	Re-keyed full text. Transcribed from first editions or scholarly editions; 99.995%+ accuracy claimed.	Subscription (institutional). Searchable via ProQuest platform. Boolean & proximity search. Browsable by author, genre, and period.	Selective canonical coverage only. No POS tags or linguistic metadata. Commercial subscription required. Not bulk-downloadable for text mining.
EEBO (Early English Books Online)	Microfilming from 1938; online 1998 (UMI/Chadwyck-Healey); ProQuest from c.2003	146,000+ titles, 1473–1700. Covers STC I (Pollard & Redgrave), STC II (Wing), Thomason Tracts, Tract Supplement. 17 million+ pages. Virtually all surviving print in English to 1700.	Page images. Bitonal scans from microfilm (greyscale from 2012). PDF & TIFF. Images only — no searchable text unless a TCP transcription exists for that title.	Subscription (ProQuest). Browse and search by ESTC metadata (author, title, date, STC number). Full-text search only available for TCP-transcribed subset. Images not freely reusable.	Images are not text. Microfilm artefacts (bleed-through, damage). Black-and-white scanning distorts typeface detail. Coverage approximately 92% complete. Cannot be computationally processed without the TCP layer.
TCP Phase 1 (EEBO-TCP Phase I)	Transcription 2000–2009; public release 1 January 2015	25,368 texts selected from EEBO. Selection biased towards New Cambridge Bibliography authors, then thematic and format batches. Coverage c.1475–1700.	Hand-keyed XML. TEI P5 XML with structural markup (headings, verse, notes, figures). No POS tags or lemmatisation. Spelling is original and unregularised.	Freely downloadable from TCP GitHub, Michigan, and Oxford Text Archive. Bulk XML. Searchable via ProQuest EEBO or Michigan interface. No POS query syntax.	Canonical selection bias. Original spelling makes linguistic search inconsistent across the corpus. No POS or lemma data. Some transcription errors. Fixed snapshot — not updated.
TCP Phase 2 (EEBO-TCP Phase II)	Transcription 2009 onwards; public release January–August 2020	~35,000 additional texts from EEBO (combined total with Phase 1: ~60,000 texts). Broader coverage with more emphasis on English-language text. Completes the TCP transcription project.	Hand-keyed XML. Same TEI P5 XML format as Phase 1. No POS tags, no lemmatisation, original spelling throughout.	Now fully public. Bulk download from TCP GitHub. Integrated into ProQuest EEBO interface alongside Phase 1. Also accessible via EarlyPrint and CQPweb corpora.	Same limitations as Phase 1. Approximately 85,000 EEBO titles remain without any transcription. Ongoing corrections are community-driven.
EEBO V3 / CQPweb (Lancaster University / UCREL annotated corpus)	Built on TCP Phases 1 & 2; annotated version mounted on CQPweb; current version c.2015–ongoing	44,422 texts; 1.2 billion running tokens. Both TCP phases processed through Lancaster’s UCREL annotation pipeline. Spelling regularisation applied first, then POS tagging and lemmatisation. Available on the same CQPweb server as Lancaster’s hand-keyed and linguistically detailed Shakespeare corpus, which can be filtered by play, character, scene, and experimental filters including the social class of the speaking character.	POS-tagged corpus with lemmatisation. Eight annotation fields per token: original form, regularised spelling, lemma, POS tag, and further linguistic metadata. CQP (Corpus Query Processor) syntax enables grammatical pattern search across the full corpus.	Accessed via Lancaster CQPweb (cqpweb.lancs.ac.uk/eebov3). Free but requires account registration. CQP query language. KWIC concordance, frequency breakdown by date, and collocation tools. Fixed corpus — not updated in real time.	POS tagger trained on modern English; accuracy is reduced for early modern syntax and morphology. No metadata filtering by author or title within the CQP interface. Fixed snapshot. Spelling regularisation introduces editorial decisions.