Concept

Language model

A language model is a probabilistic model of a natural language that can generate probabilities of a series of words, based on text corpora in one or multiple languages it was trained on. Large language models, as their most advanced form, are a combination of feedforward neural networks and transformers. They have superseded recurrent neural network-based models, which had previously superseded the pure statistical models, such as word n-gram language model. Language models are useful for a variety of tasks, including speech recognition (helping prevent predictions of low-probability (e.g. nonsense) sequences), machine translation, natural language generation (generating more human-like text), optical character recognition, handwriting recognition, grammar induction, information retrieval, and other. Maximum entropy language models encode the relationship between a word and the n-gram history using feature functions. The equation is where is the partition function, is the parameter vector, and is the feature function. In the simplest case, the feature function is just an indicator of the presence of a certain n-gram. It is helpful to use a prior on or some form of regularization. The log-bilinear model is another example of an exponential language model. Continuous representations or embeddings of words are produced in recurrent neural network-based language models (known also as continuous space language models). Such continuous space embeddings help to alleviate the curse of dimensionality, which is the consequence of the number of possible sequences of words increasing exponentially with the size of the vocabulary, furtherly causing a data sparsity problem. Neural networks avoid this problem by representing words as non-linear combinations of weights in a neural net. Although sometimes matching human performance, it is not clear they are plausible cognitive models. At least for recurrent neural networks it has been shown that they sometimes learn patterns which humans do not learn, but fail to learn patterns that humans typically do learn.

Official source

https://en.wikipedia.org/wiki/Language_model

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Language model

Graph Chatbot

Chat with Graph Search

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Driving and suppressing the human language network using large language models

Revisiting Character-level Adversarial Attacks for Language Models

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Driving and suppressing the human language network using large language models

Revisiting Character-level Adversarial Attacks for Language Models