Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Visual processing can be seen as the integration and segmentation of features. Objects are composed of contours, integrated into shapes and segmented from other contours. Information also needs to be integrated to solve the ill-posed problems of vision. For example, in the "color" perception of an object, illuminance needs to be discounted, requiring large-scale integration of luminance values. Whereas there is little controversy about the crucial role of integration, very little is known about how it really works. In this thesis, I focused on large-scale spatiotemporal information using two paradigms. First, I used the Ternus-Pikler display (TPD) to understand non-retinotopic, temporal integration, and then I used crowding to understand spatial integration across, more or less, the entire visual field. Motions of object parts are perceived relative to the specific object. For example, a reflector on a bicycle wheel seems to rotate even though it is cycloidal in retinotopic coordinates. This is because the reflector's motion is subtracted from the bike's horizontal motion. Instead of bike motion, I used the TPD, which is perfectly suited to understand non-retinotopic processing. There are two possibilities of how information may be integrated non-retinotopically: either based on attentional tracking, e.g., of the reflector's motion, or relying on inbuilt automated mechanisms. I showed that attentional tracking does not play a major role for non-retinotopic processing in the TPD. Second, I showed that invisible retinotopic information can strongly modulate the visible, non-retinotopic percept, further supporting automated integration processes. Crowding occurs when the perception of a target deteriorates because of the surrounding elements. It is the standard situation in everyday vision, since elements are rarely encountered in isolation. The classic model of vision integrates information from low-level to high-level feature detectors. By adding flankers, this model can only predict performance deterioration. However, this prediction was proven wrong because flankers far from the target can even lead to a release of crowding. Integration across the entire visual field is crucial. Here, I systematically investigated the characteristics of this large-scale integration. First, I dissected complex multi-flanker configurations and showed that low-level aspects play only a minor role. Configural aspects and the Gestalt principle of Prägnanz seem to be involved instead. However, as I showed secondly, the basic Gestalt principles fail to explain our results. Lastly, I tested several computational models, including one-stage feedforward models that integrate information within a local area or across the whole visual field, and two-stage recursive models that integrate global information and then explicitly segment elements. I showed that all models fail, unless they take explicit grouping and segmentation processing into accounts, such as capsule networks and the Laminart model.Overall, spatial and temporal integration is rather a complex inbuilt automated mechanism, and integration occurs across the whole visual field, contrary to most classic and recent models in vision. Moreover, global integration can only be reproduced by two-stage models, which process grouping and segmentation. To better understand perception, we need to consider models that group elements by multiple processes and recursively segment other groups explicitly.
Silvestro Micera, Daniela De Luca