This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it — with unobservable ("hidden") states. As part of the definition, HMM requires that there be an observable process whose outcomes are "influenced" by the outcomes of in a known way.
Windows Speech Recognition (WSR) is speech recognition developed by Microsoft for Windows Vista that enables voice commands to control the desktop user interface, dictate text in electronic documents and email, navigate websites, perform keyboard shortcuts, and operate the mouse cursor. It supports custom macros to perform additional or supplementary tasks. Hotkeys for this action is Windows logo key+Ctrl+S (windows 10, windows 11) WSR is a locally processed speech recognition platform; it does not rely on cloud computing for accuracy, dictation, or recognition, but adapts based on contexts, grammars, speech samples, training sessions, and vocabularies.
The goal of this thesis is to develop and design new feature representations that can improve the automatic speech recognition (ASR) performance in clean as well noisy conditions. One of the main shor
It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time spe
It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time spe