DigitizationDigitization is the process of converting information into a digital (i.e. computer-readable) format. The result is the representation of an object, , sound, document, or signal (usually an analog signal) obtained by generating a series of numbers that describe a discrete set of points or samples. The result is called digital representation or, more specifically, a , for the object, and digital form, for the signal.
Information extractionInformation extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction Due to the difficulty of the problem, current approaches to IE (as of 2010) focus on narrowly restricted domains.
Book scanningBook scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical books and magazines into digital media such as , electronic text, or electronic books (e-books) by using an . Large scale book scanning projects have made many books available online. Digital books can be easily distributed, reproduced, and read on-screen. Common file formats are DjVu, Portable Document Format (PDF), and (TIFF).
Knowledge extractionKnowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, s) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema.
Terminology extractionTerminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc.
Internet ArchiveThe Internet Archive is an American digital library founded on May 10, 1996, and chaired by free information advocate Brewster Kahle. It provides free access to collections of digitized materials like websites, software applications, music, audiovisual and print materials. The Archive is also an activist organization, advocating a free and open Internet. , the Internet Archive holds more than 39 million print materials, 11.6 million pieces of audiovisual content, 2.6 million software programs, 15 million audio files, 4.
Automatic summarizationAutomatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data. Text summarization is usually implemented by natural language processing methods, designed to locate the most informative sentences in a given document.
MetadataMetadata (or metainformation) is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters.
Preservation (library and archive)In conservation, library and archival science, preservation is a set of preventive conservation activities aimed at prolonging the life of a record, book, or object while making as few changes as possible. Preservation activities vary widely and may include monitoring the condition of items, maintaining the temperature and humidity in collection storage areas, writing a plan in case of emergencies, digitizing items, writing relevant metadata, and increasing accessibility.
HistoryHistory (derived ) is the systematic study and documentation of the human past. The period of events before the invention of writing systems is considered prehistory. "History" is an umbrella term comprising past events as well as the memory, discovery, collection, organization, presentation, and interpretation of these events. Historians seek knowledge of the past using historical sources such as written documents, oral accounts, art and material artifacts, and ecological markers.