Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers

We propose to leverage Transformer architectures for non-autoregressive human motion prediction. Our approach decodes elements in parallel from a query sequence, instead of conditioning on previous predictions such as in state-of-the-art RNN-based approaches. In such a way our approach is less computational intensive and potentially avoids error accumulation to long term elements in the sequence. In that context, our contributions are fourfold: (i) we frame human motion prediction as a sequence-tosequence problem and propose a non-autoregressive Transformer to infer the sequences of poses in parallel; (ii) we propose to decode sequences of 3D poses from a query sequence generated in advance with elements from the input sequence; (iii) we propose to perform skeleton-based activity classification from the encoder memory, in the hope that identifying the activity can improve predictions; (iv) we show that despite its simplicity, our approach achieves competitive results in two public datasets, although surprisingly more for short term predictions rather than for long term ones.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers

Graph Chatbot

Chattez avec Graph Search

Coupling a recurrent neural network to SPAD TCSPC systems for real-time fluorescence lifetime imaging

Dual-frequency spectral radar retrieval of snowfall microphysics: a physics-driven deep-learning approach

Source-Free Open-Set Domain Adaptation for Histopathological Images via Distilling Self-Supervised Vision Transformer

Coupling a recurrent neural network to SPAD TCSPC systems for real-time fluorescence lifetime imaging

Source-Free Open-Set Domain Adaptation for Histopathological Images via Distilling Self-Supervised Vision Transformer

Dual-frequency spectral radar retrieval of snowfall microphysics: a physics-driven deep-learning approach