Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the evolution of text encoding, from ASCII to Unicode, which allows the representation of various writing systems. It delves into the challenges of encoding multilingual texts and the development of XML as a hierarchical text representation. The discussion extends to the Text Encoding Initiative (TEI) and the complexities of encoding structured texts. Additionally, it explores the use of OCR, HTR, and crowdsourcing for text corrections, as well as the handling of spelling variations in historical texts.