Web archivingWeb archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.
Legal depositLegal deposit is a legal requirement that a person or group submit copies of their publications to a repository, usually a library. The number of copies required varies from country to country. Typically, the national library is the primary repository of these copies. In some countries there is also a legal deposit requirement placed on the government, and it is required to send copies of documents to publicly accessible libraries. The legislation covering the requirement varies from country to country, but is often enshrined in copyright law.
JSTORJSTOR ('dʒeɪstɔːr; short for Journal Storage) is a digital library founded in 1994. Originally containing digitized back issues of academic journals, it now encompasses books and other primary sources as well as current issues of journals in the humanities and social sciences. It provides full-text searches of almost 2,000 journals. Most access is by subscription but some of the site is public domain, and open access content is available free of charge. William G. Bowen, president of Princeton University from 1972 to 1988, founded JSTOR in 1994.
Project GutenbergProject Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks." It was founded in 1971 by American writer Michael S. Hart and is the oldest digital library. Most of the items in its collection are the full texts of books or individual stories in the public domain. All files can be accessed for free under an open format layout, available on almost any computer. , Project Gutenberg had reached 50,000 items in its collection of free eBooks.
Digital asset managementDigital asset management (DAM) and the implementation of its use as a computer application is required in the collection of digital assets to ensure that the owner, and possibly their delegates, can perform operations on the data files. The term media asset management (MAM) may be used in reference to Digital Asset Management when applied to the sub-set of digital objects commonly considered "media", namely audio recordings, photos, and videos.
ArchiveAn archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located. Archives contain primary source documents that have accumulated over the course of an individual or organization's lifetime, and are kept to show the function of that person or organization. Professional archivists and historians generally understand archives to be records that have been naturally and necessarily generated as a product of regular legal, commercial, administrative, or social activities.
LibraryA library is a collection of books, and possibly other materials and media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a virtual space, or both. A library's collection normally includes printed materials which can be borrowed, and a reference section of publications which are not permitted to leave the library and can only be viewed inside the premises.
Aaron SwartzAaron Hillel Swartz (November 8, 1986January 11, 2013) was an American computer programmer, entrepreneur, writer, political organizer, and Internet hacktivist. As a programmer, Swartz helped develop the web feed format RSS; the technical architecture for Creative Commons, an organization dedicated to creating copyright licenses; the website framework web.py; and Markdown, a lightweight markup language format. Swartz was involved in the development of the social news aggregation website Reddit until he departed from the company in 2007.
Full-text searchIn text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references). In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria (for example, text specified by a user).
Primary sourceIn the study of history as an academic discipline, a primary source (also called an original source) is an artifact, document, diary, manuscript, autobiography, recording, or any other source of information that was created at the time under study. It serves as an original source of information about the topic. Similar definitions can be used in library science and other areas of scholarship, although different fields have somewhat different definitions.