Vocoder

A vocoder (ˈvoʊkoʊdər, a portmanteau of voice and encoder) is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was invented in 1938 by Homer Dudley at Bell Labs as a means of synthesizing human speech. This work was developed into the channel vocoder which was used as a voice codec for telecommunications for speech coding to conserve bandwidth in transmission. By encrypting the control signals, voice transmission can be secured against interception. Its primary use in this fashion is for secure radio communication. The advantage of this method of encryption is that none of the original signal is sent, only envelopes of the bandpass filters. The receiving unit needs to be set up in the same filter configuration to re-synthesize a version of the original signal spectrum. The vocoder has also been used extensively as an electronic musical instrument. The decoder portion of the vocoder, called a voder, can be used independently for speech synthesis. The human voice consists of sounds generated by the opening and closing of the glottis by the vocal cords, which produces a periodic waveform with many harmonics. This basic sound is then filtered by the nose and throat (a complicated resonant piping system) to produce differences in harmonic content (formants) in a controlled way, creating the wide variety of sounds used in speech. There is another set of sounds, known as the unvoiced and plosive sounds, which are created or modified by the mouth in different fashions. The vocoder examines speech by measuring how its spectral characteristics change over time. This results in a series of signals representing these frequencies at any particular time as the user speaks. In simple terms, the signal is split into a number of frequency bands (the larger this number, the more accurate the analysis) and the level of signal present at each frequency band gives the instantaneous representation of the spectral energy content.

On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding

Hervé Bourlard, Afsaneh Asaei, Milos Cernak

Phonological features extracted by neural network have shown interesting potential for low bit rate speech vocoding. The span of phonological features is wider than the span of phonetic features, and thus fewer frames need to be transmitted. Moreover, the ...

2015

Source/Filter Factorial Hidden Markov Model, with Application to Pitch and Formant Tracking

Jean-Philippe Thiran

Tracking vocal tract formant frequencies (

f_p

) and estimating the fundamental frequency (

f_0

) are two tracking problems that have been tackled in many speech processing works, often independently, with applications to articulatory parameters estimation ...

Institute of Electrical and Electronics Engineers2013

On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding

Hervé Bourlard, Afsaneh Asaei, Milos Cernak

2015

Source/Filter Factorial Hidden Markov Model, with Application to Pitch and Formant Tracking

Jean-Philippe Thiran

Tracking vocal tract formant frequencies (

f_p

) and estimating the fundamental frequency (

f_0

) are two tracking problems that have been tackled in many speech processing works, often independently, with applications to articulatory parameters estimation ...

Institute of Electrical and Electronics Engineers2013

Graph Chatbot

Towards a breakthrough Speaker Identification approach for Law Enforcement Agencies: SIIP

On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding

Source/Filter Factorial Hidden Markov Model, with Application to Pitch and Formant Tracking

Towards a breakthrough Speaker Identification approach for Law Enforcement Agencies: SIIP

On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding

Source/Filter Factorial Hidden Markov Model, with Application to Pitch and Formant Tracking