Optional Module: Text Encoding

This module will introduce you to some of the basics of text encoding with three related encoding schemas – HTML, the encoding language of the internet; XML, a very flexible, extensive encoding language; and the TEI, a humanities-specific encoding language. The readings will give you a bit of an introduction to the three encoding schemas and how they have been used in DH, while the technical activities will give you the chance to try using the schemas yourself.


1. read w3schools.com’s HTML Introduction and XML Introduction

2. read James Cummings, “A world of difference: Myths and misconceptions about the TEI,” Digital Scholarship in the Humanities 34 (2019).

3. read Tim Causer and Valerie Wallace, “Building a Volunteer Community: Results and Findings from Transcribe Bentham” Digital Humanities Quarterly 6, no. 2 (2012).

4. read Christopher N. Warren, “Historiography’s Two Voics: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB)Journal of Cultural Analytics (2018).

5. use the Slack channel to make note of your thoughts as you read and complete these exercises; depending on how many people are completing this optional module, this may or may not turn into an active discussion (that’s okay).

Technical Activities

1. using this Programming Historian tutorial, try to create a simple HTML file (ignore the instruction to save to a specific directory and save the file where ever you want)

2. return to Causer and Wallace (#3 in the readings) then click the “XML” button at the top of the article (immediately above the title) or use this direct link. Once you’re there, examine how the article is formatted “under the hood” and how this compares to the view you read from before.

3. using HTML or XML or TEI (the w3 schools documentation will help you with the first two while this introduction to TEI Lite will help with the third), try to encode a one-page historical document (either something you’ve downloaded/photographed in the archives for your own research or one of the documents you used for the Tropy upload in module 2). How might this encoding allow you to more easily search, analyze, and display your documents?