Publication

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Mathew Magimai Doss, Eklavya Sarkar
2022
Conference paper
Abstract

Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about the speech signal. This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD. The first approach demarcates voiced regions using a composite signal composed of different zero-frequency filtered signals. The second approach feeds the composite signal as input to the rVAD algorithm. These approaches are compared with other supervised and unsupervised VAD methods in the literature, and are evaluated on the Aurora-2 database, across a range of SNRs (20 to -5 dB). Our studies show that the proposed ZFF-based methods perform comparable to state-of-art VAD methods and are more invariant to added degradation and different channel characteristics.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (33)
Analogue filter
Analogue filters are a basic building block of signal processing much used in electronics. Amongst their many applications are the separation of an audio signal before application to bass, mid-range, and tweeter loudspeakers; the combining and later separation of multiple telephone conversations onto a single channel; the selection of a chosen radio station in a radio receiver and rejection of others.
Band-stop filter
In signal processing, a band-stop filter or band-rejection filter is a filter that passes most frequencies unaltered, but attenuates those in a specific range to very low levels. It is the opposite of a band-pass filter. A notch filter is a band-stop filter with a narrow stopband (high Q factor). Narrow notch filters (optical) are used in Raman spectroscopy, live sound reproduction (public address systems, or PA systems) and in instrument amplifiers (especially amplifiers or preamplifiers for acoustic instruments such as acoustic guitar, mandolin, bass instrument amplifier, etc.
Low-pass filter
A low-pass filter is a filter that passes signals with a frequency lower than a selected cutoff frequency and attenuates signals with frequencies higher than the cutoff frequency. The exact frequency response of the filter depends on the filter design. The filter is sometimes called a high-cut filter, or treble-cut filter in audio applications. A low-pass filter is the complement of a high-pass filter. In optics, high-pass and low-pass may have different meanings, depending on whether referring to the frequency or wavelength of light, since these variables are inversely related.
Show more
Related publications (56)

Penalized denoising of vehicle trajectories collected by a swarm of drones

Nikolaos Geroliminis, Emmanouil Barmpounakis, Georgios Anagnostopoulos

Vehicle trajectory datasets collected in urban traffic environments with drones pose unique chal- lenges in terms of denoising due to extensive visual restrictions, perspective distortions and human- induced errors. This article taps into the unexplored po ...
2022

On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease

Mathew Magimai Doss, Julian David Fritsch

Parkinson's disease produces several motor symptoms, including different speech impairments that are known as hypokinetic dysarthria. Symptoms associated to dysarthria affect different dimensions of speech such as phonation, articulation, prosody, and inte ...
ISCA-INT SPEECH COMMUNICATION ASSOC2021

Detection of S1 and S2 locations in phonocardiogram signals using zero frequency filter

Gürkan Yilmaz

Heart auscultation is a widely used technique for diagnosing cardiac abnormalities. In that context, capturing of phonocardiogram (PCG) signals and automatically monitoring of the heart by identifying S1 and S2 complexes is an emerging field. One of the fi ...
2020
Show more
Related MOOCs (6)
Digital Signal Processing I
Basic signal processing concepts, Fourier analysis and filters. This module can be used as a starting point or a basic refresher in elementary DSP
Digital Signal Processing II
Adaptive signal processing, A/D and D/A. This module provides the basic tools for adaptive filtering and a solid mathematical framework for sampling and quantization
Digital Signal Processing III
Advanced topics: this module covers real-time audio processing (with examples on a hardware board), image processing and communication system design.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.