Publication

Natural Language Processing (NLP) driven categorisation and detection of discourse in historical US patents

Related concepts (42)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Software patent

A software patent is a patent on a piece of software, such as a computer program, libraries, user interface, or algorithm. A patent is a set of exclusionary rights granted by a state to a patent holder for a limited period of time, usually 20 years. These rights are granted to patent applicants in exchange for their disclosure of the inventions. Once a patent is granted in a given country, no person may make, use, sell or import/export the claimed invention in that country without the permission of the patent holder.

Sentence embedding

In natural language processing, a sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information. State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token preprended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks.

United States patent law

Under United States law, a patent is a right granted to the inventor of a (1) process, machine, article of manufacture, or composition of matter, (2) that is new, useful, and non-obvious. A patent is the right to exclude others, for a limited time (usually, 20 years) from profiting of a patented technology without the consent of the patent-holder. Specifically, it is the right to exclude others from: making, using, selling, offering for sale, importing, inducing others to infringe, applying for an FDA approval, and/or offering a product specially adapted for practice of the patent.

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.

Unsupervised learning

Unsupervised learning, is paradigm in machine learning where, in contrast to supervised learning and semi-supervised learning, algorithms learn patterns exclusively from unlabeled data. Neural network tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups.

Patent claim

In a patent or patent application, the claims define in technical terms the extent, i.e. the scope, of the protection conferred by a patent, or the protection sought in a patent application. In other words, the purpose of the claims is to define which subject-matter is protected by the patent (or sought to be protected by the patent application). This is termed as the "notice function" of a patent claim—to warn others of what they must not do if they are to avoid infringement liability.

Language model

A language model is a probabilistic model of a natural language that can generate probabilities of a series of words, based on text corpora in one or multiple languages it was trained on. Large language models, as their most advanced form, are a combination of feedforward neural networks and transformers. They have superseded recurrent neural network-based models, which had previously superseded the pure statistical models, such as word n-gram language model.

Bag-of-words model

The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity. The bag-of-words model has also been used for computer vision. The bag-of-words model is commonly used in methods of document classification where the (frequency of) occurrence of each word is used as a feature for training a classifier.

Patent troll

In international law and business, patent trolling or patent hoarding is a categorical or pejorative term applied to a person or company that attempts to enforce patent rights against accused infringers far beyond the patent's actual value or contribution to the prior art, often through hardball legal tactics (frivolous litigation, vexatious litigation, strategic lawsuits against public participation (SLAPP), chilling effects, and the like). Patent trolls often do not manufacture products or supply services based upon the patents in question.

Natural-language understanding

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem. There is considerable commercial interest in the field because of its application to automated reasoning, machine translation, question answering, news-gathering, text categorization, voice-activation, archiving, and large-scale content analysis.