Unsupervised Speech/Non-speech Detection for Automatic Speech Recognition in Meeting Rooms

The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant speech components to classify speech and non-speech signals for a given audio signal. Manually segmented speech segments, short-term energy, short-term energy and zero-crossing based segmentation techniques, and a recently proposed Multi Layer Perceptron (MLP) classifier system are tested for comparison purposes. Speech recognition evaluations of the segmentation methods are performed on a standard database and tested in conditions where the signal-to-noise ratio (SNR) varies considerably, as in the cases of close-talking headset, lapel, distant microphone array output, and distant microphone. The results reveal that the proposed method is more reliable and less sensitive to mode of signal acquisition and unforeseen conditions.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Unsupervised Speech/Non-speech Detection for Automatic Speech Recognition in Meeting Rooms

Graph Chatbot

Chattez avec Graph Search

Sparse Autoencoders for Speech Modeling and Recognition

Automatic pathological speech assessment

How Does Pre-Trained Wav2Vec 2.0 Perform On Domain-Shifted Asr? An Extensive Benchmark On Air Traffic Control Communications

Sparse Autoencoders for Speech Modeling and Recognition

Automatic pathological speech assessment

How Does Pre-Trained Wav2Vec 2.0 Perform On Domain-Shifted Asr? An Extensive Benchmark On Air Traffic Control Communications