Web crawlerA Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering). Web search engines and some other websites use Web crawling or spidering software to update their web content or indices of other sites' web content. Web crawlers copy pages for processing by a search engine, which indexes the downloaded pages so that users can search more efficiently.
User-Agent headerIn computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system.
Web serviceA web service (WS) is either: a service offered by an electronic device to another electronic device, communicating with each other via the Internet, or a server running on a computer device, listening for requests at a particular port over a network, serving web documents (HTML, JSON, XML, images). In a web service, a web technology such as HTTP is used for transferring machine-readable file formats such as XML and JSON.
WHATWGThe Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, leading Web browser vendors, in 2004. WHATWG is responsible for maintaining multiple web-related technical standards, including the specifications for the HyperText Markup Language (HTML) and the Document Object Model (DOM).
Media typeA media type (formerly known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. Their purpose is somewhat similar to file extensions in that they identify the intended data format. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication of these classifications.
Machine-readable medium and dataIn communications and computing, a machine-readable medium (or computer-readable medium) is a medium capable of storing data in a format easily readable by a digital computer or a sensor. It contrasts with human-readable medium and data. The result is called machine-readable data or computer-readable data, and the data itself can be described as having machine-readability. Machine-readable data must be structured data. Attempts to create machine-readable data occurred as early as the 1960s.
Greater-than signThe greater-than sign is a mathematical symbol that denotes an inequality between two values. The widely adopted form of two equal-length strokes connecting in an acute angle at the right, , has been found in documents dated as far back as 1631. In mathematical writing, the greater-than sign is typically placed between two values being compared and signifies that the first number is greater than the second number. Examples of typical usage include 1.5 > 1 and 1 > −2. The less-than sign and greater-than sign always "point" to the smaller number.
DeprecationIn several fields, especially computing, deprecation is the discouragement of use of some terminology, feature, design, or practice, typically because it has been superseded or is no longer considered efficient or safe, without completely removing it or prohibiting its use. Typically, deprecated materials are not completely removed to ensure legacy compatibility or back up practice in case new methods are not functional in an odd scenario. It can also imply that a feature, design, or practice will be removed or discontinued entirely in the future.
MicroformatMicroformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, events, blog posts, products, recipes, etc.). They allow software to process the information reliably by having set classes refer to a specific type of data rather than being arbitrary. Microformats emerged around 2005 and were predominantly designed for use by search engines, web syndication and aggregators such as RSS.
News aggregatorIn computing, a news aggregator, also termed a feed aggregator, feed reader, news reader, RSS reader, or simply an aggregator is client software or a web application that aggregates syndicated web content such as online newspapers, blogs, podcasts, and video blogs (vlogs) in one location for easy viewing. The updates distributed may include journal tables of contents, podcasts, videos, and news items. Visiting many separate websites frequently to find out if the content on the site has been updated can take a long time.