Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Classically, vision is seen as a cascade of local, feedforward computations. This framework has been tremendously successful, inspiring a wide range of ground-breaking findings in neuroscience and computer vision. Recently, feedforward Convolutional Neural Networks (ffCNNs), a kind of deep neural network inspired by this classic framework, have revolutionized computer vision and been adopted as tools in neuroscience. However, despite these successes, there is much more to vision. First, there are flagrant architectural differences between biological systems and the classic framework. For example, recurrence is abundant in the brain but absent from the classic framework and ffCNNs. Although there is widespread agreement about the importance of these recurrent connections, their computational role is still poorly understood. Second, these architectural differences lead to behavioural differences too, highlighted by psychophysical evidence. Relatedly, ffCNNs are extremely vulnerable to small changes to their inputs and do not generalize well beyond the dataset used to train them. Human vision, in contrast, is much more robust. New insights are needed to face up to these challenges. In this thesis, I use visual crowding and related psychophysical effects as probes into visual processes that go beyond the classic framework. In crowding, perception of a target deteriorates in clutter. I focus on global aspects of crowding, in which perception of a small target is strongly modulated by the global configuration of elements across the visual field. I show that models based on the classic framework, including ffCNNs, cannot explain these effects for principled reasons and identify recurrent grouping and segmentation as a key missing ingredient. Then, I show that capsule networks, a recent kind of deep learning architecture combining the power of ffCNNs with recurrent grouping and segmentation, naturally explain these effects. I provide psychophysical evidence that humans indeed use a similar recurrent grouping and segmentation strategy in global crowding effects. In crowding, visual elements interfere across space. To study how elements interfere over time, I use the Sequential Metacontrast psychophysical paradigm, in which perception of visual elements depends on elements presented hundreds of milliseconds later. I psychophysically characterize the temporal structure of this interference and propose a simple computational model. My results support the idea that perception is a discrete process. I lay out theoretical implications of these findings. Together, the results presented here provide stepping-stones towards a fuller understanding of the visual system by suggesting architectural changes needed for more human-like neural computations.
Silvestro Micera, Daniela De Luca
The capabilities of deep learning systems have advanced much faster than our ability to understand them. Whilst the gains from deep neural networks (DNNs) are significant, they are accompanied by a growing risk and gravity of a bad outcome. This is tr ...