Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture explores the field of Digital Humanities, focusing on the processing of large collections of digital texts. It delves into the extraction of hidden regularities and structures from massive textual objects, diachronic and synchronic patterns, and the reconstruction of complex meaning spaces. The lecture discusses the origins and convergence of Humanities Computing and Computational Linguistics, emphasizing the formal foundation of Digital Humanities. It also covers the challenges posed by very large textual objects, the significance of text processing pipelines, and the growth of digital databases of historical texts. Various projects and initiatives in the field, such as Project Gutenberg and Wikisource, are highlighted, along with the importance of text reuse and the use of regular expressions and n-grams in text analysis.