In this paper, we are interested in exploring Deep Neural Network (DNN) based speaker embedding for Random-digit task using content information. To this end, a technique is applied to automatically select common phonetic units between the enrollment and test data to produce speaker verification scores. Furthermore, a novel approach is proposed to incorporate content information in the DNN directly. It is hypothesized that features extracted using this DNN will be helpful for the task. Experiments on the RSR dataset show that the proposed method outperforms the baseline i-vector system by 43% relative equal error rate.
Demetri Psaltis, Mario Paolone, Christophe Moser, Luisa Lambertini
Jean-Paul Richard Kneib, Emma Elizabeth Tolley, Tianyue Chen, Michele Bianco