Automatic summarizationAutomatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data. Text summarization is usually implemented by natural language processing methods, designed to locate the most informative sentences in a given document.
Text miningText mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al.
Word embeddingIn natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.
Complete topological vector spaceIn functional analysis and related areas of mathematics, a complete topological vector space is a topological vector space (TVS) with the property that whenever points get progressively closer to each other, then there exists some point towards which they all get closer. The notion of "points that get progressively closer" is made rigorous by or , which are generalizations of , while "point towards which they all get closer" means that this Cauchy net or filter converges to The notion of completeness for TVSs uses the theory of uniform spaces as a framework to generalize the notion of completeness for metric spaces.
Bounded set (topological vector space)In functional analysis and related areas of mathematics, a set in a topological vector space is called bounded or von Neumann bounded, if every neighborhood of the zero vector can be inflated to include the set. A set that is not bounded is called unbounded. Bounded sets are a natural way to define locally convex polar topologies on the vector spaces in a dual pair, as the polar set of a bounded set is an absolutely convex and absorbing set. The concept was first introduced by John von Neumann and Andrey Kolmogorov in 1935.
PageRankPageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. According to Google: PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
Google SearchGoogle Search (also known simply as Google or Google.com) is a search engine provided and operated by Google. Handling more than 3.5 billion searches per day, it has a 92% share of the global search engine market. It is the most-visited website in the world. Additionally, it is the most searched and used search engine in the entire world. The order of search results returned by Google is based, in part, on a priority rank system called "PageRank".
Natural language processingNatural language processing (NLP) is an interdisciplinary subfield of linguistics and computer science. It is primarily concerned with processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them.
Bornological spaceIn mathematics, particularly in functional analysis, a bornological space is a type of space which, in some sense, possesses the minimum amount of structure needed to address questions of boundedness of sets and linear maps, in the same way that a topological space possesses the minimum amount of structure needed to address questions of continuity. Bornological spaces are distinguished by the property that a linear map from a bornological space into any locally convex spaces is continuous if and only if it is a bounded linear operator.
Word2vecWord2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector.