Publication

Joint speech and speaker recognition

Related concepts (38)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Speaker recognition

Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).

Speech recognition

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Pattern recognition

Pattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Leslie speaker

The Leslie speaker is a combined amplifier and loudspeaker that projects the signal from an electric or electronic instrument and modifies the sound by rotating a baffle chamber ("drum") in front of the loudspeakers. A similar effect is provided by a rotating system of horns in front of the treble driver. It is most commonly associated with the Hammond organ, though it was later used for the electric guitar and other instruments. A typical Leslie speaker contains an amplifier, a treble horn and a bass speaker—though specific components depend upon the model.

Feedforward neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow.

Perceptron

In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Wireless speaker

Wireless speakers are loudspeakers that receive audio signals using radio frequency (RF) waves rather than over audio cables. The two most popular RF frequencies that support audio transmission to wireless loudspeakers include a variation of WiFi IEEE 802.11, while others depend on Bluetooth to transmit audio data to the receiving speaker. Apart from the employed RF standard, such speakers can basically be distinguished by their dedicated field of use.

Loudspeaker

A loudspeaker (commonly referred to as a speaker or speaker driver) is an electroacoustic transducer that converts an electrical audio signal into a corresponding sound. A speaker system, also often simply referred to as a speaker or loudspeaker, comprises one or more such speaker drivers, an enclosure, and electrical connections possibly including a crossover network. The speaker driver can be viewed as a linear motor attached to a diaphragm which couples that motor's movement to motion of air, that is, sound.

Hidden Markov model

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it — with unobservable ("hidden") states. As part of the definition, HMM requires that there be an observable process whose outcomes are "influenced" by the outcomes of in a known way.