Optical character recognitionOptical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of s of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast).
Mutual informationIn probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as shannons (bits), nats or hartleys) obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.
Handwriting recognitionHandwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available.
Expectation–maximization algorithmIn statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step.
Visual perceptionVisual perception is the ability to interpret the surrounding environment through photopic vision (daytime vision), color vision, scotopic vision (night vision), and mesopic vision (twilight vision), using light in the visible spectrum reflected by objects in the environment. This is different from visual acuity, which refers to how clearly a person sees (for example "20/20 vision"). A person can have problems with visual perceptual processing even if they have 20/20 vision.
Large language modelA large language model (LLM) is a language model characterized by its large size. Their size is enabled by AI accelerators, which are able to process vast amounts of text data, mostly scraped from the Internet. The artificial neural networks which are built can contain from tens of millions and up to billions of weights and are (pre-)trained using self-supervised learning and semi-supervised learning. Transformer architecture contributed to faster training.
Entropy (information theory)In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to : where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannons"), while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys".
Composition (visual arts)The term composition means "putting together". It can be thought of as the organization of the elements of art according to the principles of art. Composition can apply to any work of art, from music through writing and into photography, that is arranged using conscious thought. In the visual arts, composition is often used interchangeably with various terms such as design, form, visual ordering, or formal structure, depending on the context. In graphic design for press and desktop publishing, composition is commonly referred to as page layout.
Foundation modelsA foundation model (also called base model) is a large machine learning (ML) model trained on a vast quantity of data at scale (often by self-supervised learning or semi-supervised learning) such that it can be adapted to a wide range of downstream tasks. Foundation models have helped bring about a major transformation in how artificial intelligence (AI) systems are built, such as by powering prominent chatbots and other user-facing AI.
HallucinationA hallucination is a perception in the absence of an external stimulus that has the qualities of a real perception. Hallucinations are vivid, substantial, and are perceived to be located in external objective space. Hallucination is a combination of two conscious states of brain wakefulness and REM sleep. They are distinguishable from several related phenomena, such as dreaming (REM sleep), which does not involve wakefulness; pseudohallucination, which does not mimic real perception, and is accurately perceived as unreal; illusion, which involves distorted or misinterpreted real perception; and , which does not mimic real perception, and is under voluntary control.