Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture explores the process of digitizing historical documents, focusing on the pipeline structure, document modeling, content extraction, and types of document structures. It delves into the challenges of describing document structure and content, emphasizing the need for standardization. The lecture also covers the Open Annotation Model and Shared Canvas Data Model, highlighting their role in annotating and describing digital resources collaboratively. Furthermore, it discusses the application of neural networks in handwritten text recognition and image segmentation, showcasing the advancements in machine learning techniques for historical document processing.