**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Learning Trajectory Dependencies for Human Motion Prediction

Résumé

Human motion prediction, i.e., forecasting future body poses given observed pose sequence, has typically been tackled with recurrent neural networks (RNNs). However, as evidenced by prior work, the resulted RNN models suffer from prediction errors accumulation, leading to undesired discontinuities in motion prediction. In this paper, we propose a simple feed-forward deep network for motion prediction, which takes into account both temporal smoothness and spatial dependencies among human body joints. In this context, we then propose to encode temporal information by working in trajectory space, instead of the traditionally-used pose space. This alleviates us from manually defining the range of temporal dependencies (or temporal convolutional filter size, as done in previous work). Moreover, spatial dependency of human pose is encoded by treating a human pose as a generic graph (rather than a human skeletal kinematic tree) formed by links between every pair of body joints. Instead of using a pre-defined graph structure, we design a new graph convolutional network to learn graph connectivity automatically. This allows the network to capture long range dependencies beyond that of human kinematic tree. We evaluate our approach on several standard benchmark datasets for motion prediction, including Human3.6M, the CMU motion capture dataset and 3DPW. Our experiments clearly demonstrate that the proposed approach achieves state of the art performance, and is applicable to both angle-based and position-based pose representations. The code is available at https: //github.com/wei-mao- 2019/LearnTrajDep

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (84)

Concepts associés (15)

Réseau de neurones à propagation avant

Un réseau de neurones à propagation avant, en anglais feedforward neural network, est un réseau de neurones artificiels acyclique, se distinguant ainsi des réseaux de neurones récurrents. Le plus co

Réseau de neurones récurrents

Un réseau de neurones récurrents (RNN pour recurrent neural network en anglais) est un réseau de neurones artificiels présentant des connexions récurrentes. Un réseau de neurones récurrents est const

Apprentissage profond

L'apprentissage profond ou apprentissage en profondeur (en anglais : deep learning, deep structured learning, hierarchical learning) est un sous-domaine de l’intelligence artificiel

Chargement

Chargement

Chargement

With ever greater computational resources and more accessible software, deep neural networks have become ubiquitous across industry and academia.
Their remarkable ability to generalize to new samples defies the conventional view, which holds that complex, over-parameterized networks would be prone to overfitting.
This apparent discrepancy is exacerbated by our inability to inspect and interpret the high-dimensional, non-linear, latent representations they learn, which has led many to refer to neural networks as

`black-boxes''. The Law of Parsimony states that `

simpler solutions are more likely to be correct than complex ones''. Since they perform quite well in practice, a natural question to ask, then, is in what way are neural networks simple?
We propose that compression is the answer. Since good generalization requires invariance to irrelevant variations in the input, it is necessary for a network to discard this irrelevant information. As a result, semantically similar samples are mapped to similar representations in neural network deep feature space, where they form simple, low-dimensional structures.
Conversely, a network that overfits relies on memorizing individual samples. Such a network cannot discard information as easily.
In this thesis we characterize the difference between such networks using the non-negative rank of activation matrices. Relying on the non-negativity of rectified-linear units, the non-negative rank is the smallest number that admits an exact non-negative matrix factorization.
We derive an upper bound on the amount of memorization in terms of the non-negative rank, and show it is a natural complexity measure for rectified-linear units.
With a focus on deep convolutional neural networks trained to perform object recognition, we show that the two non-negative factors derived from deep network layers decompose the information held therein in an interpretable way. The first of these factors provides heatmaps which highlight similarly encoded regions within an input image or image set. We find that these networks learn to detect semantic parts and form a hierarchy, such that parts are further broken down into sub-parts.
We quantitatively evaluate the semantic quality of these heatmaps by using them to perform semantic co-segmentation and co-localization. In spite of the convolutional network we use being trained solely with image-level labels, we achieve results comparable or better than domain-specific state-of-the-art methods for these tasks.
The second non-negative factor provides a bag-of-concepts representation for an image or image set. We use this representation to derive global image descriptors for images in a large collection. With these descriptors in hand, we perform two variations content-based image retrieval, i.e. reverse image search. Using information from one of the non-negative matrix factors we obtain descriptors which are suitable for finding semantically related images, i.e., belonging to the same semantic category as the query image. Combining information from both non-negative factors, however, yields descriptors that are suitable for finding other images of the specific instance depicted in the query image, where we again achieve state-of-the-art performance.Humans and some other animals are able to perform tasks that require coordination of movements across multiple temporal scales, ranging from hundreds of milliseconds to several seconds. The fast timescale at which neurons naturally operate, on the order of tens of milliseconds, is well-suited to support motor control of rapid movements. In contrast, to coordinate movements on the order of seconds, a neural network should produce reliable dynamics on a similarly âslowâ timescale. Neurons and synapses exhibit biophysical mechanisms whose timescales range from tens of milliseconds to hours, which suggests a possible role of these mechanisms in producing slow reliable dynamics. However, how such mechanisms influence network dynamics is not yet understood. An alternative approach to achieve slow dynamics in a neural network consists in modifying its connectivity structure. Still, the limitations of this approach and in particular to what degree the weights require fine-tuning, remain unclear. Understanding how both the single neuron mechanisms and the connectivity structure might influence the network dynamics
to produce slow timescales is the main goal of this thesis.
We first consider the possibility of obtaining slow dynamics in binary networks by tuning their connectivity. It is known that binary networks can produce sequential dynamics. However, if the sequences consist of random patterns, the typical length of the longest sequence that can be produced grows linearly with the number of units. Here, we show that we can overcome this limitation by carefully designing the sequence structure. More precisely, we obtain a constructive proof that allows to obtain sequences whose length scales exponentially with the number of units. To achieve this however, one needs to exponentially fine-tune the connectivity matrix.
Next, we focus on the interaction between single neuron mechanisms and recurrent dynamics. Particular attention is dedicated to adaptation, which is known to have a broad range of timescales and is therefore particularly interesting for the subject of this thesis. We study the dynamics of a random network with adaptation using mean-field techniques, and we show that the network can enter a state of resonant chaos. Interestingly, the resonance frequency of this state is independent of the connectivity strength and depends only on the properties of the single neuron model. The approach used to study networks with adaptation can also be applied when considering linear rate units with an arbitrary number of auxiliary variables. Based on a qualitative analysis of the mean-field theory for a random network whose neurons are described by a D -dimensional rate model, we conclude that the statistics of the chaotic dynamics are strongly influenced by the single neuron model under investigation.
Using a reservoir computing approach, we show preliminary evidence that slow adaptation can be beneficial when performing tasks that require slow timescales. The positive impact of adaptation on the network performance is particularly strong in the presence of noise. Finally, we propose a network architecture in which the slowing-down effect due to adaptation is combined with a hierarchical structure, with the purpose of efficiently generate sequences that require multiple, hierarchically organized timescales.

Joao Emanuel Felipe Gerhard, Wulfram Gerstner

The simultaneous recording of the activity of many neurons poses challenges for multivariate data analysis. Here, we propose a general scheme of reconstruction of the functional network from spike train recordings. Effective, causal interactions are estimated by fitting generalized linear models on the neural responses, incorporating effects of the neurons' self-history, of input from other neurons in the recorded network and of modulation by an external stimulus. The coupling terms arising from synaptic input can be transformed by thresholding into a binary connectivity matrix which is directed. Each link between two neurons represents a causal influence from one neuron to the other, given the observation of all other neurons from the population. The resulting graph is analyzed with respect to small-world and scale-free properties using quantitative measures for directed networks. Such graph-theoretic analyses have been performed on many complex dynamic networks, including the connectivity structure between different brain areas. Only few studies have attempted to look at the structure of cortical neural networks on the level of individual neurons. Here, using multi-electrode recordings from the visual system of the awake monkey, we find that cortical networks lack scale-free behavior, but show a small, but significant small-world structure. Assuming a simple distance-dependent probabilistic wiring between neurons, we find that this connectivity structure can account for all of the networks' observed small-world-ness. Moreover, for multi-electrode recordings the sampling of neurons is not uniform across the population. We show that the small-world-ness obtained by such a localized sub-sampling overestimates the strength of the true small-world structure of the network. This bias is likely to be present in all previous experiments based on multi-electrode recordings.