We present the results of the EUMSSI team's participation in the Multimodal Person Discovery task at the MediaEval challenge 2015. The goal is to identify all people who simultaneously appear and speak in a video corpus, which implicitly involves both audio stream and visual stream. We emphasize on improving each modality separately and benchmarking them to analyze their pros and cons.
Michael Herzog, David Pascucci, Maëlan Quentin Menétrey, Maya Roinishvili