DigitizationDigitization is the process of converting information into a digital (i.e. computer-readable) format. The result is the representation of an object, , sound, document, or signal (usually an analog signal) obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a , for the object, and digital form, for the signal.
Book scanningBook scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical books and magazines into digital media such as , electronic text, or electronic books (e-books) by using an . Large scale book scanning projects have made many books available online. Digital books can be easily distributed, reproduced, and read on-screen. Common file formats are DjVu, Portable Document Format (PDF), and (TIFF).
Digital libraryA digital library, also called an online library, an internet library, a digital repository, a library without walls, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts.
Google BooksGoogle Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database. Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project. Additionally, Google has partnered with a number of magazine publishers to digitize their archives.
Electronic publishingElectronic publishing (also referred to as publishing, digital publishing, or online publishing) includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. It also includes the editing of books, journals, and magazines to be posted on a screen (computer, e-reader, tablet, or smartphone). Electronic publishing has become common in scientific publishing where it has been argued that peer-reviewed scientific journals are in the process of being replaced by electronic publishing.
Named-entity recognitionNamed-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp.
Open accessOpen access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. Under some models of open access publishing, barriers to copying or reuse are also reduced or removed by applying an open license for copyright. The main focus of the open access movement is "peer reviewed research literature". Historically, this has centered mainly on print-based academic journals.
Digital preservationIn library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and technologies, and it combines policies, strategies and actions to ensure access to reformatted and "born-digital" content, regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.
Entity linkingIn natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris".
Preservation (library and archive)In conservation, library and archival science, preservation is a set of preventive conservation activities aimed at prolonging the life of a record, book, or object while making as few changes as possible. Preservation activities vary widely and may include monitoring the condition of items, maintaining the temperature and humidity in collection storage areas, writing a plan in case of emergencies, digitizing items, writing relevant metadata, and increasing accessibility.