Web archivingWeb archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.
Search engine cacheSearch engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or . When a web crawler crawls the web, it collects the contents of each web page to allow the page to be indexed by the search engine. At the same time, it can store a full copy of that page. The search engine may make the copy accessible to users in the search engine results.
AltaVistaAltaVista was a Web search engine established in 1995. It became one of the most-used early search engines, but lost ground to Google and was purchased by Yahoo! in 2003, which retained the brand, but based all AltaVista searches on its own search engine. On July 8, 2013, the service was shut down by Yahoo!, and since then the domain has redirected to Yahoo!'s own search site. The word "AltaVista" is formed from the words for "high view" or "upper view" in Spanish (alta + vista); thus, it colloquially translates to "overview".
Full-text searchIn text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references). In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria (for example, text specified by a user).
Media typeA media type (formerly known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. Their purpose is somewhat similar to file extensions in that they identify the intended data format. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication of these classifications.