Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Feature extraction is a key step in many machine learning and signal processing applications. For speech signals in particular, it is important to derive features that contain both the vocal characteristics of the speaker and the content of the speech. In this paper, we introduce a convolutional auto-encoder (CAE) to extract features from speech represented via proposed short-time discrete cosine transform (STDCT). We then introduce a deep neural mapping at the encoding bottleneck to enable converting a source speaker’s speech to a target speaker’s speech while preserving the source-speech content. We further compare this approach to clustering-based and linear mappings.
Vinitra Swamy, Paola Mejia Domenzain, Julian Thomas Blackwell, Isadora Alves de Salles