Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Multi-session training conditions are becoming increasingly common in recent benchmark datasets for both text-independent and text-dependent speaker verification. In the state-of-the-art i-vector framework for speaker verification, such conditions are addressed by simple techniques such as averaging the individual i-vectors, averaging scores, or modifying the Probabilistic Linear Discriminant Analysis (PLDA) scoring hypothesis for multi-session enrollment. The aforementioned techniques fail to exploit the speaker variabilities observed in the enrollment data for target speakers. In this paper, we propose to exploit the multi-session training data by estimating a speaker-dependent covariance matrix and updating the intra-speaker covariance during PLDA scoring for each target speaker. The proposed method is further extended by combining covariance adaptation and score averaging. In this method, the individual examples of the target speaker are compared against the test data as opposed to an averaged i-vector, and the scores obtained are then averaged. The proposed methods are evaluated on the NIST SRE 2012 dataset. Relative improvements of up to 29% in equal error rate are obtained.
Hervé Bourlard, Selen Hande Kabil