Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media

Person discovery in the absence of prior identity knowledge requires accurate association of visual and auditory cues. In broadcast data, multimodal analysis faces additional challenges due to narrated voices over muted scenes or dubbing in different languages. To address these challenges, we define and analyze the problem of dubbing detection in broadcast data, which has not been explored before. We propose a method to represent the temporal relationship between the auditory and visual streams. This method consists of canonical correlation analysis to learn a joint multimodal space, and long short term memory (LSTM) networks to model cross-modality temporal dependencies. Our contributions also include the introduction of a newly acquired dataset of face-speech segments from TV data, which we have made publicly available. The proposed method achieves promising performance on this real world dataset as compared to several baselines.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media

Graph Chatbot

Chat with Graph Search

Temporal windows of unconscious processing cannot easily be disrupted

Development of Visual Memory Capacity Following Early-Onset and Extended Blindness

Structure-preserving approaches and data-driven closure modeling for model order reduction

Development of Visual Memory Capacity Following Early-Onset and Extended Blindness

Structure-preserving approaches and data-driven closure modeling for model order reduction

Temporal windows of unconscious processing cannot easily be disrupted