PlainTalkPlainTalk is the collective name for several speech synthesis (MacinTalk) and speech recognition technologies developed by Apple Inc. In 1990, Apple invested a lot of work and money in speech recognition technology, hiring many researchers in the field. The result was "PlainTalk", released with the AV models in the Macintosh Quadra series from 1993. It was made a standard system component in System 7.1.2, and has since been shipped on all PowerPC and some 68k Macintoshes. Apple's text-to-speech uses diphones.
Articulatory synthesisArticulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the tongue, jaw, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract. There is a long history of attempts to build mechanical "talking heads".
Viterbi algorithmThe Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM). The algorithm has found universal application in decoding the convolutional codes used in both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space communications, and 802.
Computational linguisticsComputational linguistics has since 2020s became a near-synonym of either natural language processing or language technology, with deep learning approaches, such as large language models, overperforming the specific approaches previously used in the field. The field overlapped with artificial intelligence since the efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English.
GorfGorf is an arcade video game released in 1981 by Midway Manufacturing, whose name was advertised as an acronym for "Galactic Orbiting Robot Force". It is a fixed shooter with five distinct levels, the first of which is based on Space Invaders and another on Galaxian. The game makes heavy use of synthesized speech for the Gorfian robot which teases the player, powered by the Votrax speech chip. Gorf allows the player to buy two additional lives per quarter before starting the game, for a maximum of seven lives.
Interactive voice responseInteractive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telecommunications, IVR allows customers to interact with a company's host system via a telephone keypad or by speech recognition, after which services can be inquired about through the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed.
Motor theory of speech perceptionThe motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them.
ChipspeechChipspeech is a vocal synthesizer software which was created by Plogue with the goal of recreating 1980s synthesizers. The software is used for creating vocals for use within music. Chipspeech is designed to produce vintage-style vocals from synthesizers that were used by the music industry in the 1980s, having a cut off date of 1989 technology. The vocals, therefore, are not meant to sound realistic and are more suited for sound experimentation. It works as a text-to-speech method.