Audio deepfakeAn audio deepfake (also known as voice cloning) is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices (due to throat disease or other medical problems) to get them back. Commercially, it has opened the door to several opportunities.
ProbabilityProbability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility of the event and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur. A simple example is the tossing of a fair (unbiased) coin.
Dialogue systemA dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel. The elements of a dialogue system are not defined because this idea is under research, however, they are different from chatbot. The typical GUI wizard engages in a sort of dialogue, but it includes very few of the common dialogue system components, and the dialogue state is trivial.
Feature learningIn machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process.
Long short-term memoryLong short-term memory (LSTM) network is a recurrent neural network (RNN), aimed to deal with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory".
Neural networkA neural network can refer to a neural circuit of biological neurons (sometimes also called a biological neural network), a network of artificial neurons or nodes in the case of an artificial neural network. Artificial neural networks are used for solving artificial intelligence (AI) problems; they model connections of biological neurons as weights between nodes. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed.
Speech repetitionSpeech repetition occurs when individuals speak the sounds that they have heard another person pronounce or say. In other words, it is the saying by one individual of the spoken vocalizations made by another individual. Speech repetition requires the person repeating the utterance to have the ability to map the sounds that they hear from the other person's oral pronunciation to similar places and manners of articulation in their own vocal tract.
Probability interpretationsThe word probability has been used in a variety of ways since it was first applied to the mathematical study of games of chance. Does probability measure the real, physical, tendency of something to occur, or is it a measure of how strongly one believes it will occur, or does it draw on both these elements? In answering such questions, mathematicians interpret the probability values of probability theory. There are two broad categories of probability interpretations which can be called "physical" and "evidential" probabilities.
Auditory phoneticsAuditory phonetics is the branch of phonetics concerned with the hearing of speech sounds and with speech perception. It thus entails the study of the relationships between speech stimuli and a listener's responses to such stimuli as mediated by mechanisms of the peripheral and central auditory systems, including certain areas of the brain. It is said to compose one of the three main branches of phonetics along with acoustic and articulatory phonetics, though with overlapping methods and questions.
Background noiseBackground noise or ambient noise is any sound other than the sound being monitored (primary sound). Background noise is a form of noise pollution or interference. Background noise is an important concept in setting noise levels. Background noises include environmental noises such as water waves, traffic noise, alarms, extraneous speech, bioacoustic noise from animals, and electrical noise from devices such as refrigerators, air conditioning, power supplies, and motors.