Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture explores the quest for a good image-text embedding for remote sensing visual question answering, discussing various methods such as element-wise multiplication, Multimodal Compact Bilinear pooling, and Multimodal Tucker Fusion. The presentation delves into the baseline system, related works, and the results obtained from low and very high-resolution image sets.