We present the results of the EUMSSI team’s participation in the Multimodal Person Discovery task. The goal is to identify all people who simultaneously appear and speak in a video corpus. In the proposed system, besides improving each modality, we emphasize on the ranking of multiple results from both audio stream and visual stream.
Michael Herzog, David Pascucci, Maëlan Quentin Menétrey, Maya Roinishvili