Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor GPT-2, it is a decoder-only transformer model of deep neural network, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. It uses a 2048-tokens-long context and then-unprecedented size of 175 billion parameters, requiring 800GB to store. The model demonstrated strong zero-shot and few-shot learning on many tasks.
Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3's underlying model.
According to The Economist, improved algorithms, powerful computers, and an increase in digitized data have fueled a revolution in machine learning, with new techniques in the 2010s resulting in "rapid improvements in tasks" including manipulating language. Software models are trained to learn by using thousands or millions of examples in a "structure ... loosely based on the neural architecture of the brain". One architecture used in natural language processing (NLP) is a neural network based on a deep learning model that was first introduced in 2017—the transformer architecture. There are a number of NLP systems capable of processing, mining, organizing, connecting and contrasting textual input, as well as correctly answering questions.
On June 11, 2018, OpenAI researchers and engineers posted their original paper introducing the first generative pre-trained transformer (GPT)a type of generative large language model that is pre-trained with an enormous and diverse corpus of text via datasets, followed by discriminative fine-tuning to focus on a specific task. GPT models are transformer-based deep learning neural network architectures.