An Information Theoretic Approach to Speaker Diarization of Meeting Recordings

In this thesis we investigate a non parametric approach to speaker diarization for meeting recordings based on an information theoretic framework. The problem is formulated using the Information Bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. The distance between speech segments is selected as the Jensen-Shannon divergence as it arises from the IB objective function optimization. In the first part of the thesis, we explore IB based diarization with Mel frequency cepstral coefficients (MFCC) as input features. We study issues related to IB based speaker diarization such as optimizing the IB objective function, criteria for inferring the number of speakers. Furthermore, we benchmark the proposed system against a state-of-the-art systemon the NIST RT06 (Rich Transcription) meeting data for speaker diarization. The IB based system achieves similar speaker error rates (16.8%) as compared to a baseline HMM/GMM system (17.0%). This approach being non parametric clustering, perform diarization six times faster than realtime while the baseline is slower than realtime. The second part of thesis proposes a novel feature combination system in the context of IB diarization. Both speaker clustering and speaker realignment steps are discussed. In contrary to current systems, the proposed method avoids the feature combination by averaging log-likelihood scores. Two different sets of features were considered – (a) combination of MFCC features with time delay of arrival features (b) a four feature stream combination that combines MFCC, TDOA, modulation spectrum and frequency domain linear prediction. Experiments show that the proposed system achieve 5% absolute improvement over the baseline in case of two feature combination, and 7% in case of four feature combination. The increase in algorithm complexity of the IB system is minimal with more features. The system with four feature input performs in real time that is ten times faster than the GMM based system.

Lower-bounds on the Bayesian Risk in Estimation Procedures via f–Divergences

Michael Christoph Gastpar, Adrien Vandenbroucque, Amedeo Roberto Esposito

We consider the problem of parameter estimation in a Bayesian setting and propose a general lower-bound that includes part of the family of f-Divergences. The results are then applied to specific settings of interest and compared to other notable results i ...

2022

A Wasserstein-based measure of conditional dependence

Negar Kiyavash, Seyed Jalal Etesami, Kun Zhang

Measuring conditional dependencies among the variables of a network is of great interest to many disciplines. This paper studies some shortcomings of the existing dependency measures in detecting direct causal influences or their lack of ability for group ...

2022

From Generalisation Error to Transportation-cost Inequalities and Back

Michael Christoph Gastpar, Amedeo Roberto Esposito

In this work, we connect the problem of bounding the expected generalisation error with transportation-cost inequalities. Exposing the underlying pattern behind both approaches we are able to generalise them and go beyond Kullback- Leibler Divergences/Mutu ...

2022

An Information Theoretic Approach to Speaker Diarization of Meeting Recordings

Lower-bounds on the Bayesian Risk in Estimation Procedures via f–Divergences

A Wasserstein-based measure of conditional dependence

From Generalisation Error to Transportation-cost Inequalities and Back

Graph Chatbot

Chattez avec Graph Search

A Wasserstein-based measure of conditional dependence

Lower-bounds on the Bayesian Risk in Estimation Procedures via f–Divergences

From Generalisation Error to Transportation-cost Inequalities and Back