Speech recognitionSpeech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
Speaker recognitionSpeaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).
Speech processingSpeech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.
Direct inward dialDirect inward dialing (DID), also called direct dial-in (DDI) in Europe and Oceania, is a telecommunication service offered by telephone companies to subscribers who operate a private branch exchange (PBX) system. The feature provides service for multiple telephone numbers over one or more analog or digital physical circuits to the PBX, and transmits the dialed telephone number to the PBX so that a PBX extension is directly accessible for an outside caller, possibly by-passing an auto-attendant.
Voice over IPVoice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for voice calls, the delivery of voice communication sessions over Internet Protocol (IP) networks, such as the Internet. The broader terms Internet telephony, broadband telephony, and broadband phone service specifically refer to the provisioning of voice and other communications services (fax, SMS, voice messaging) over the Internet, rather than via the public switched telephone network (PSTN), also known as plain old telephone service (POTS).
Viterbi algorithmThe Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM). The algorithm has found universal application in decoding the convolutional codes used in both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space communications, and 802.
Deep learningDeep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
OSI modelThe Open Systems Interconnection model (OSI model) is a conceptual model from the International Organization for Standardization (ISO) that "provides a common basis for the coordination of standards development for the purpose of systems interconnection." In the OSI reference model, the communications between a computing system are split into seven different abstraction layers: Physical, Data Link, Network, Transport, Session, Presentation, and Application.
Error correction codeIn computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The central idea is that the sender encodes the message in a redundant way, most often by using an error correction code or error correcting code (ECC). The redundancy allows the receiver not only to detect errors that may occur anywhere in the message, but often to correct a limited number of errors.
Interactive voice responseInteractive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telecommunications, IVR allows customers to interact with a company's host system via a telephone keypad or by speech recognition, after which services can be inquired about through the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed.