Publication

Source/Filter Factorial Hidden Markov Model, with Application to Pitch and Formant Tracking

Jean-Philippe Thiran
2013
Journal paper
Abstract

Tracking vocal tract formant frequencies (fpf_p) and estimating the fundamental frequency (f0f_0) are two tracking problems that have been tackled in many speech processing works, often independently, with applications to articulatory parameters estimations, speech analysis/synthesis or linguistics. Many works assume an auto-regressive (AR) model to fit the spectral envelope, hence indirectly estimating the formant tracks from the AR parameters. However, directly estimating the formant frequencies, or equivalently the poles of the AR filter, allows to further model the smoothness of the desired tracks. In this paper, we propose a Factorial Hidden Markov Model combined with a vocal source/filter model, with parameters naturally encoding the f0f_0 and fpf_p tracks. Two algorithms are proposed, with two different strategies: first, a simplification of the underlying model, with a parameter estimation based on variational methods, and second, a sparse decomposition of the signal, based on Non-negative Matrix Factorization methodology. The results are comparable to state-of-the-art formant tracking algorithms. With the use of a complete production model, the proposed systems provide robust formant tracks which can be used in various applications. The algorithms could also be extended to deal with multiple-speaker signals.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (35)
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database.
Additive synthesis
Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together. The timbre of musical instruments can be considered in the light of Fourier theory to consist of multiple harmonic or inharmonic partials or overtones. Each partial is a sine wave of different frequency and amplitude that swells and decays over time due to modulation from an ADSR envelope or low frequency oscillator. Additive synthesis most directly generates sound by adding the output of multiple sine wave generators.
Speech processing
Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.
Show more
Related publications (32)

On Breathing Pattern Information in Synthetic Speech

Mathew Magimai Doss, Zohreh Mostaani

The respiratory system is an integral part of human speech production. As a consequence, there is a close relation between respiration and speech signal, and the produced speech signal carries breathing pattern related information. Speech can also be gener ...
ISCA-INT SPEECH COMMUNICATION ASSOC2022

Automatic pathological speech assessment

Parvaneh Janbakhshi

Many pathologies cause impairments in the speech production mechanism resulting in reduced speech intelligibility and communicative ability. To assist the clinical diagnosis, treatment and management of speech disorders, automatic pathological speech asses ...
EPFL2022

Spectral Subspace Analysis for Automatic Assessment of Pathological Speech Intelligibility

Hervé Bourlard, Ina Kodrasi, Parvaneh Janbakhshi

Speech intelligibility is an important assessment criterion of the communicative performance of pathological speakers. To assist clinicians in their assessment, time- and cost-efficient automatic intelligibility measures offering a repeatable and reliable ...
2019
Show more
Related MOOCs (4)
Digital Signal Processing I
Basic signal processing concepts, Fourier analysis and filters. This module can be used as a starting point or a basic refresher in elementary DSP
Digital Signal Processing II
Adaptive signal processing, A/D and D/A. This module provides the basic tools for adaptive filtering and a solid mathematical framework for sampling and quantization
Digital Signal Processing III
Advanced topics: this module covers real-time audio processing (with examples on a hardware board), image processing and communication system design.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.