Capsule networks, but not convolutional networks explain global configurational visual effects

In human vision, perception of local features depends on all elements in the visual field and their exact configuration. For example, observers performed a vernier discrimination task. When a surrounding square was added to the vernier, the task became much more difficult: a classic crowding effect. Crucially, adding more flanking squares improved performance (uncrowding). In addition, in displays of squares and stars, small changes in the configuration changed performance strongly. Here, we show that convolutional neural networks fail to address the global aspects of configuration because, first, the target and the flankers’ representations at a given layer are pooled within the receptive fields of the subsequent layer, leading to poor performance. Second, far away elements cannot interact with the vernier to produce uncrowding. We show that capsule networks, a new kind of neural network that explicitly takes configuration into account, can capture the experimental results well.

Capsule networks, but not convolutional networks explain global configurational visual effects

Graph Chatbot

Chat with Graph Search

Predicting Visual Stimuli From Cortical Response Recorded With Wide-Field Imaging in a Mouse

Probing and modulating inter-areal coupling in the cortical visual motion processing pathway with non-invasive brain stimulation

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Predicting Visual Stimuli From Cortical Response Recorded With Wide-Field Imaging in a Mouse

Probing and modulating inter-areal coupling in the cortical visual motion processing pathway with non-invasive brain stimulation

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning