Publication

Scalable Probabilistic Models for Face and Speaker Recognition

Laurent El Shafey
2014
EPFL thesis
Abstract

In the biometrics community, face and speaker recognition are mature fields in which several systems have been proposed over the past twenty years. While existing systems perform well under controlled recording conditions, mismatch caused by the use of different sensors or a lack of cooperation from the subject still significantly affects performance, especially in challenging scenarios such as in forensics. Furthermore, existing methods suffer from scalability issues, which prevents them from taking advantage of increasingly large amounts of training data. This is otherwise a promising approach to improve accuracy in such challenging scenarios. In this thesis we address these problems of mismatch and complexity by developing scalable probabilistic models that we apply to face, speaker and bimodal recognition. Our contributions are four-fold. First, we propose a unified framework for session variability modeling techniques based on Gaussian mixture models (GMM), that encompasses inter-session variability (ISV) modeling, joint factor analysis (JFA) and total variability (TV) modeling. Second, we propose a novel exact and scalable formulation of probabilistic linear discriminant analysis (PLDA), which is a probabilistic and generative framework that models between-class and within-class variations. This formulation solves a major scalability issue, by improving both the time complexity of the training procedure from cubic to linear with respect to the number of samples per class, and the complexity of the scoring procedure. Furthermore, the implementations of all the proposed techniques are integrated into a novel collaborative open source software library called Bob 1 that enforces fair evaluations and encourages reproducible research. Fourth and finally, large-scale experiments are conducted with all of the above machine learning algorithms on several databases such as FRGC for face recognition, NIST SRE12 for speaker recognition and MOBIO for bimodal recognition, showing competitive performances.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (32)
Pattern recognition
Pattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
Complexity class
In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory. In general, a complexity class is defined in terms of a type of computational problem, a model of computation, and a bounded resource like time or memory. In particular, most complexity classes consist of decision problems that are solvable with a Turing machine, and are differentiated by their time or space (memory) requirements.
Time complexity
In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.
Show more
Related publications (70)

Robustness of Local Predictions in Atomistic Machine Learning Models

Michele Ceriotti, Federico Grasselli, Sanggyu Chong, Chiheb Ben Mahmoud

Machine learning (ML) models for molecules and materials commonly rely on a decomposition of the global target quantity into local, atom-centered contributions. This approach is convenient from a computational perspective, enabling large-scale ML-driven si ...
Washington2023

FANOK: Knockoffs in Linear Time

Quentin Rebjock

We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large-scale feature selection problems. Identifying the knockoff distribution requires solving a large-scale semidefinite progra ...
SIAM PUBLICATIONS2021

Contextually Plausible and Diverse 3D Human Motion Prediction

Mathieu Salzmann, Fatemehsadat Saleh

We tackle the task of diverse 3D human motion prediction, that is, forecasting multiple plausible future 3D poses given a sequence of observed 3D poses. In this context, a popular approach consists of using a Conditional Variational Autoencoder (CVAE). How ...
IEEE2021
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.