Publication

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

Pinar Akyazi, Samuel Thomas
2010
Conference paper
Abstract

Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (32)
Mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population.
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
K-means clustering
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.
Show more
Related publications (33)

Model-Based Clustering of Trends and Cycles of Nitrate Concentrations in Rivers Across France

Camille Roland Marie Minaudo

Elevated nitrate from human activity causes ecosystem and economic harm globally. The factors that control the spatiotemporal dynamics of riverine nitrate concentration remain difficult to describe and predict. We analyzed nitrate concentration from 4450 s ...
SPRINGER2022

Model-Class Selection Using Clustering and Classification for Structural Identification and Prediction

Ian Smith, Sai Ganesh Sarvotham Pai, Masoud Sanayei

Structural identification using physics-based models and subsequent prediction have much potential to enhance civil infrastructure asset-management decision-making. Interpreting monitoring information in the presence of multiple uncertainty sources and sys ...
2020

Concrete Dam Displacement Prediction Based on an ISODATA-GMM Clustering and Random Coefficient Model

Zhenzhu Meng, Yating Hu, Chenfei Shao

Displacement data modelling is of great importance for the safety control of concrete dams. The commonly used artificial intelligence method modelled the displacement data at each monitoring point individually, i.e., the data correlations between the monit ...
2019
Show more