In this paper, we are interested in exploring Deep Neural Network (DNN) based speaker embedding for Random-digit task using content information. To this end, a technique is applied to automatically select common phonetic units between the enrollment and test data to produce speaker verification scores. Furthermore, a novel approach is proposed to incorporate content information in the DNN directly. It is hypothesized that features extracted using this DNN will be helpful for the task. Experiments on the RSR dataset show that the proposed method outperforms the baseline i-vector system by 43% relative equal error rate.
The capabilities of deep learning systems have advanced much faster than our ability to understand them. Whilst the gains from deep neural networks (DNNs) are significant, they are accompanied by a growing risk and gravity of a bad outcome. This is tr ...
Milos Cernak, Paolo Prandoni, Natalia Nessler