**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR

Abstract

It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time spectral analysis used in hidden Markov models (HMMs), based on piecewise stationarity and state conditional independence assumptions of acoustic vectors. For example, vowels are typically quasi-stationary over 40-80ms segments, while plosive typically require analysis below 20ms segments. Thus, fixed scale analysis is clearly sub-optimal for ``optimal'' time-frequency resolution and modeling of different stationary phones found in the speech signal. In the present paper, we investigate the potential advantages of using variable size analysis windows towards improving state-of-the-art speech recognition systems. Based on the usual assumption that the speech signal can be modeled by a time-varying autoregressive (AR) Gaussian process, we estimate the largest piecewise quasi-stationary speech segments, based on the likelihood that a segment was generated by the same AR process. This likelihood is estimated from the Linear Prediction (LP) residual error. Each of these quasi-stationary segments is then used as an analysis window from which spectral features are extracted. Such an approach thus results in a variable scale time spectral analysis, adaptively estimating the largest possible analysis window size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. The speech recognition experiments on the OGI Numbers95 database, show that the proposed variable-scale piecewise stationary spectral analysis based features indeed yield improved recognition accuracy in clean conditions, compared to features based on minimum cross entropy spectrum as well as those based on fixed scale spectral analysis.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (56)

Related MOOCs (6)

Related concepts (34)

Digital Signal Processing I

Basic signal processing concepts, Fourier analysis and filters. This module can
be used as a starting point or a basic refresher in elementary DSP

Digital Signal Processing II

Adaptive signal processing, A/D and D/A. This module provides the basic
tools for adaptive filtering and a solid mathematical framework for sampling and
quantization

Digital Signal Processing III

Advanced topics: this module covers real-time audio processing (with
examples on a hardware board), image processing and communication system design.

Time series

In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average. A time series is very frequently plotted via a run chart (which is a temporal line chart).

Window function

In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval, normally symmetric around the middle of the interval, usually approaching a maximum in the middle, and usually tapering away from the middle. Mathematically, when another function or waveform/data-sequence is "multiplied" by a window function, the product is also zero-valued outside the interval: all that is left is the part where they overlap, the "view through the window".

Time–frequency analysis

In signal processing, time–frequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various time–frequency representations. Rather than viewing a 1-dimensional signal (a function, real or complex-valued, whose domain is the real line) and some transform (another function whose domain is the real line, obtained from the original via some transform), time–frequency analysis studies a two-dimensional signal – a function whose domain is the two-dimensional real plane, obtained from the signal via a time–frequency transform.

Ontological neighbourhood

Majed Chergui, Thomas Charles Henry Rossi, Malte Oppermann, Lijie Wang

Disentangling electronic and thermal effects in photoexcited perovskite materials is crucial for photovoltaic and optoelectronic applications but remains a challenge due to their intertwined nature in both the time and energy domains. In this study, we emp ...

The multi-channel Wiener filter (MWF) is a well-known multi-microphone speech enhancement technique, aiming at improving the quality of the recorded speech signals in noisy and reverberant environments. Assuming that reverberation and ambient noise can be ...

2019Theo Lasser, Aleksandra Radenovic, Marcel Leutenegger, Tomas Lukes, Adrien Charles-François Raymond Descloux, Kristin Stefanie Grussmayer

Super-resolution opticalfluctuation imaging provides a resolution beyond the diffraction limitby analysing stochasticfluorescencefluctuations with higher-order statistics. Usingnthorderspatio-temporal cross-cumulants the spatial resolution and the sampling ...

2020