SYSTEM FUSION AND SPEAKER LINKING FOR LONGITUDINAL DIARIZATION OF TV SHOWS
Publications associées (36)
Graph Chatbot
Chattez avec Graph Search
Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
As audio reproduction systems evolve to incorporate many loudspeakers, the potential to evoke faithful auditory spatial impressions increases. Psychoacoustic principles of auditory spatial perception, and methods for production and reproduction of acoustic ...
In this thesis, methods and models are developed and presented aiming at the estimation, restoration and transformation of the characteristics of human speech. During a first period of the thesis, a concept was developed that allows restoring prosodic voic ...
The integration of audio and visual information improves speech recognition performance, specially in the presence of noise. In these circumstances it is necessary to introduce audio and visual weights to control the contribution of each modality to the re ...
Background work in various levels of speech coding is reviewed, including unconstrained coding and recognition-synthesis approaches that assume the signal is speech. A pilot project in HMM-TTS based speech coding is then described, in which a comparison wi ...
The speaker diarization system developed at the International Computer Science Institute (ICSI) has played a prominent role in the speaker diarization community, and many researchers in the rich transcription community have adopted methods and techniques d ...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identifica ...
This paper proposes a joint verification-localization structure based on split-band analysis of speech signal and the mixed voicing level. To address the problems in reverberant acoustic environments, a new fundamental frequency estimation algorithm is pro ...
This paper addresses the task of automatically detecting outcomes of social interaction patterns, using non-verbal audio cues in competi- tive role-playing games (RPGs). For our experiments, we introduce a new data set which features 3 hours of audio-visua ...
With the increase in cheap commercially available sensors, recording meetings is becoming an increasingly practical option. With this trend comes the need to summarize the recorded data in semantically meaningful ways. Here, we investigate the task of auto ...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identifica ...