Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP). The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques. The techniques employed in speech coding are similar to those used in audio data compression and audio coding where appreciation of psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in voiceband speech coding, only information in the frequency band 400 to 3500 Hz is transmitted but the reconstructed signal retains adequate intelligibility. Speech coding differs from other forms of audio coding in that speech is a simpler signal than other audio signals, and statistical information is available about the properties of speech. As a result, some auditory information that is relevant in general audio coding can be unnecessary in the speech coding context. Speech coding stresses the preservation of intelligibility and pleasantness of speech while using a constrained amount of transmitted data. In addition, most speech applications require low coding delay, as latency interferes with speech interaction. Speech coders are of two classes: Waveform coders Time-domain: PCM, ADPCM Frequency-domain: sub-band coding, ATRAC Vocoders Linear predictive coding (LPC) Formant coding Machine learning, i.e. neural vocoder The A-law and μ-law algorithms used in G.711 PCM digital telephony can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution.

Graph Chatbot

Chat with Graph Search

Sparse Autoencoders for Speech Modeling and Recognition

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?

Sparse Autoencoders for Speech Modeling and Recognition

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?