Concept

Book scanning

Book scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical books and magazines into digital media such as , electronic text, or electronic books (e-books) by using an . Large scale book scanning projects have made many books available online. Digital books can be easily distributed, reproduced, and read on-screen. Common file formats are DjVu, Portable Document Format (PDF), and (TIFF). To convert the raw images optical character recognition (OCR) is used to turn book pages into a digital text format like ASCII or other similar format, which reduces the file size and allows the text to be reformatted, searched, or processed by other applications. Image scanners may be manual or automated. In an ordinary commercial image scanner, the book is placed on a flat glass plate (or platen), and a light and optical array moves across the book underneath the glass. In manual book scanners, the glass plate extends to the edge of the scanner, making it easier to line up the book's spine. A problem with scanning bound books is that when a book that is not very thin is laid flat, the part of the page close to the spine (the gutter) is significantly curved, distorting the text in that part of the scan. One solution is to separate the book into separate pages by cutting or unbinding. A non-destructive method is to hold the book in a V-shaped holder and photograph it, rather than lay it flat and scan it. The curvature in the gutter is much less pronounced this way. Pages may be turned by hand or by automated paper transport devices. Transparent plastic or glass sheets are usually pressed against the page to flatten it. After scanning, software adjusts the document images by lining it up, cropping it, picture-editing it, and converting it to text and final e-book form. Human proofreaders usually check the output for errors. Scanning at 118 dots/centimeter (300 dpi) is adequate for conversion to digital text output, but for archival reproduction of rare, elaborate or illustrated books, much higher resolution is used.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (8)
HUM-474: Press and digital history I
Au croisement de l'histoire numérique, des médias et de l'histoire publique, ce cours s'intéresse à la production, diffusion et conservation de l'information. Les étudiant·es apprendront à porter un r
NX-422: Neural interfaces
Neural interfaces (NI) are bioelectronic systems that interface the nervous system to digital technologies. This course presents their main building blocks (transducers, instrumentation & communicatio
ENV-202: Microbiology for engineers
"Microbiology for engineers" covers the main microbial processes that take place in the environment and in treatment systems. It presents elemental cycles that are catalyzed by microorganisms and that
Show more
Related lectures (28)
Chromatography Fundamentals
Covers the fundamentals of chromatography, including distribution constants and retention factors.
Asset Pricing: Theory and Applications
Explores asset pricing theory, market efficiency, risk-return relationship, and the efficient frontier.
From a Collection to an Innovation Platform: Montreux Jazz Digital Project
Explores the Montreux Jazz Digital Project, highlighting its digitization process and innovative uses of the archive.
Show more
Related publications (33)

Digital manufacturing of personalised footwear with embedded sensors

Danick Briand, Ryan Mitchell van Dommelen, Rubaiyet Iftekharul Haque, Jaemin Kim

The strong clinical demand for more accurate and personalized health monitoring technologies has called for the development of additively manufactured wearable devices. While the materials palette for additive manufacturing continues to expand, the integra ...
2023

Automatic table detection and classification in large-scale newspaper archives

In recent decades, major efforts to digitize historical documents led to the creation of large machine readable corpora, including newspapers, which are waiting to be processed and analyzed. Newspapers are a valuable historical source, notably because of t ...
2022

The impresso system architecture in a nutshell

Maud Ehrmann, Matteo Romanello

This post describes the impresso application architecture and processing in a nutshell. The text was published in October 2020 in issue number 16 of the EuropeanaTech Insights dedicated to digitized newspapers and edited by Gregory Markus and Clemens Neude ...
EuropeanaTech Insights2020
Show more
Related concepts (8)
Ebook
An ebook (short for electronic book), also known as an e-book or eBook, is a book publication made available in digital form, consisting of text, images, or both readable on the flat-panel display of computers or other electronic devices. Although sometimes defined as "an electronic version of a printed book", some e-books exist without a printed equivalent. E-books can be read on dedicated e-reader devices, but also on any computer device that features a controllable viewing screen, including desktop computers, laptops, tablets and smartphones.
Digital library
A digital library, also called an online library, an internet library, a digital repository, a library without walls, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts.
Project Gutenberg
Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks." It was founded in 1971 by American writer Michael S. Hart and is the oldest digital library. Most of the items in its collection are the full texts of books or individual stories in the public domain. All files can be accessed for free under an open format layout, available on almost any computer. , Project Gutenberg had reached 50,000 items in its collection of free eBooks.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.