研究生: 葉旭 (村永旭)
論文名稱: 雙向Transformers於骨架動作預測之應用
On Human Motion Prediction Using Bidirectional Encoder Representations from Transformers
指導教授: 方文賢
Wen-Hsien Fang
口試委員: 陳郁堂
Yie-Tarng Chen
Kuen-Tsair Lay
Chien-ching Chiu
Sheng-Luen Chung
學位類別: 碩士
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 74
中文關鍵詞: 注意力機制骨架動作預測
外文關鍵詞: transformer, human motion prediction
Pose prediction found applications in a variety of areas.
However, current methods adopting recurrent neural networks suffer from error accumulation in the training stage. Furthermore, encoder-decoder architecture in general fails to predict continuous poses between the end of the encoder input and the beginning of the decoder output.
Benefiting from the recent successes of the attention mechanism, in the thesis, we propose a novel method which combined the transformer encoder architecture and universal transformer.
The new architecture is free of error accumulation because this architecture processes data parallelly and the weight of updating for each position is equal. Moreover, the proposed attention map helps attention mechanism to refrain the predicted poses from discontinuity.
We also apply adaptive computation time algorithm to optimize the iteration numbers of performing an attention mechanism.
The mean absolute loss is considered to handle human motion prediction problem in the training process on the Human3.6M dataset.
Simulations show that the proposed method outperforms the main state-of-the-art approaches.

Table of contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Human Motion Prediction . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Modeling of Human Motion Prediction . . . . . . . . . . . . . . . 5 2.2 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Generative Adversarial Nets . . . . . . . . . . . . . . . . . . . . . 6 2.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Data Pre-processing and Position Encoding . . . . . . . . . . . . 9 3.3 Transformer Encoder Stack . . . . . . . . . . . . . . . . . . . . . 11 iii 3.3.1 Scaled Dot-Product Attention . . . . . . . . . . . . . . . . 12 3.3.2 Multi-Head Attention . . . . . . . . . . . . . . . . . . . . . 16 3.3.3 Position-wise Feed-Forward Networks . . . . . . . . . . . . 17 3.4 Universal Transformers . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1 Evaluation Protocol and Experimental Setup . . . . . . . . . . . . 22 4.2 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2.1 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . 24 4.2.2 Transformer Con guration . . . . . . . . . . . . . . . . . . 24 4.2.3 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Comparison With State-of-the-Art Methods . . . . . . . . . . . . 27 5 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . 28 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Appendix A : Class-wise ablation studies . . . . . . . . . . . . . . . . . . . 29 Appendix B : Performance comparison of state-of-the-art method . . . . . 44 Appendix C : Visualization of attention distributions . . . . . . . . . . . . 52 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

