A de facto standard approach in solving computer vision tasks is to use a common high-resolution camera and choose its placement on an agent based on human intuition. On the other hand, extremely simple and well-designed visual sensors found throughout nat ...
Springer Science and Business Media Deutschland GmbH2025
We present a method to control a text-to-image generative model to produce training data useful for supervised learning. Unlike previous works that employ an open-loop approach via pre-defined prompts to generate new data using either a language model or h ...
We introduce a set of image transformations that can be used as corruptions to evaluate the robustness of models as well as data augmentation mechanisms for training neural networks. The primary distinction of the proposed transformations is that, unlike e ...
We propose a pre-training strategy called Multi-modal Multi-task Masked Autoencoders (MultiMAE). It differs from standard Masked Autoencoding in two key aspects: I) it can optionally accept additional modalities of information in the input besides the RGB ...