Crowding and the Architecture of the Visual System

Classically, vision is seen as a cascade of local, feedforward computations. This framework has been tremendously successful, inspiring a wide range of ground-breaking findings in neuroscience and computer vision. Recently, feedforward Convolutional Neural Networks (ffCNNs), a kind of deep neural network inspired by this classic framework, have revolutionized computer vision and been adopted as tools in neuroscience. However, despite these successes, there is much more to vision. First, there are flagrant architectural differences between biological systems and the classic framework. For example, recurrence is abundant in the brain but absent from the classic framework and ffCNNs. Although there is widespread agreement about the importance of these recurrent connections, their computational role is still poorly understood. Second, these architectural differences lead to behavioural differences too, highlighted by psychophysical evidence. Relatedly, ffCNNs are extremely vulnerable to small changes to their inputs and do not generalize well beyond the dataset used to train them. Human vision, in contrast, is much more robust. New insights are needed to face up to these challenges. In this thesis, I use visual crowding and related psychophysical effects as probes into visual processes that go beyond the classic framework. In crowding, perception of a target deteriorates in clutter. I focus on global aspects of crowding, in which perception of a small target is strongly modulated by the global configuration of elements across the visual field. I show that models based on the classic framework, including ffCNNs, cannot explain these effects for principled reasons and identify recurrent grouping and segmentation as a key missing ingredient. Then, I show that capsule networks, a recent kind of deep learning architecture combining the power of ffCNNs with recurrent grouping and segmentation, naturally explain these effects. I provide psychophysical evidence that humans indeed use a similar recurrent grouping and segmentation strategy in global crowding effects. In crowding, visual elements interfere across space. To study how elements interfere over time, I use the Sequential Metacontrast psychophysical paradigm, in which perception of visual elements depends on elements presented hundreds of milliseconds later. I psychophysically characterize the temporal structure of this interference and propose a simple computational model. My results support the idea that perception is a discrete process. I lay out theoretical implications of these findings. Together, the results presented here provide stepping-stones towards a fuller understanding of the visual system by suggesting architectural changes needed for more human-like neural computations.

Crowding and the Architecture of the Visual System

Graph Chatbot

Chat with Graph Search

Predicting Visual Stimuli From Cortical Response Recorded With Wide-Field Imaging in a Mouse

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Safe Deep Neural Networks

Predicting Visual Stimuli From Cortical Response Recorded With Wide-Field Imaging in a Mouse

Safe Deep Neural Networks

Deep Learning Theory Through the Lens of Diagonal Linear Networks