Basic Search / Detailed Display

Author: 黃冠汯
Kuan-Hung Huang
Thesis Title: 基於自學習特徵對齊之半監督式動作辨識
STRA: Self­-Training Representation Alignment for Semi­-Supervised Action Recognition
Advisor: 花凱龍
Kai-Lung Hua
Committee: 項天瑞
Tien-Ruey Hsiang
鐘國亮
Kuo-Liang Chung
郭景明
Jing-Ming Guo
陳永耀
Yung-Yao Chen
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2021
Graduation Academic Year: 109
Language: 英文
Pages: 47
Keywords (in Chinese): 半監督是學習動作辨識自學習圖卷積
Keywords (in other languages): Semi-supervised Learning, Action Recognition, Self-Training, Graph Convolutional Network
Reference times: Clicks: 245Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 近幾年來,基於骨架的動作辨識技術越來越進步,大多數的技術能取
    得如此卓越成果是藉助於標註完善的資料集。然而,在現實世界中建立如
    此大型的資料集所需要花費的成本是很昂貴的。此外,建立資料集的同時
    可能會有蒐集到不完整資料的情況,例如,骨架中某些關節點座標的缺失
    或是某幾幀的骨架缺失。因為完整和不完整的骨架資料之間存在著不小的
    差異性,導致訓練好的模型部署在實際應用場景時,模型表現會與訓練時
    產生落差。因此我們提出了以下兩種方法:(1) 我們提出了一個新的自訓
    練架構,通過生成偽標籤來減少標籤資料的使用。我們的架構能進一步保
    證偽標籤資料在訓練時能充分收斂。(2) 我們提出特徵對齊模塊,該模塊
    採用一致性正規化技術最小化關節缺失以及骨架缺失在實際應用上對模型
    的影響。我們提出的方法,STRA,不僅能提高模型在少量標籤資料下的
    性能,並且在關節缺失或是骨架缺失的情況下也可以得到相近的成果。我
    們也在 NTU 和 NUCLA 兩個資料集上進行驗證,並且與最新的幾種方法
    進行比較。


    Most existing skeleton-­based action recognition models leverage large
    labeled datasets to achieve great results. However, procuring a large amount
    of labeled skeleton data in real­-world scenarios to enable those models is
    costly. Furthermore, missing joints and missing frames problems commonly occur during data collection. These missing joints and frames cause
    problems during testing due to the representational differences between
    complete and incomplete skeletal data. To address these problems, we propose two functionalities: (1) We propose a new self-­training framework
    that reduces labeled skeleton data usage by generating pseudo­-labels. Our
    framework can take small amounts of labeled data and generate pseudo-labels enough to guarantee model convergence; (2) We propose a representation alignment module that adopts consistency regularization to minimize the effect of missing joints and frames. Our proposed method, STRA,
    not only improves the performance of GCN models with only a minimal
    amount of labeled data but also achieves similar performance under conditions with missing joints and frames. We evaluate our method on the NTU
    and N­UCLA datasets against state-­of-­th-e­art works.

    Recommendation Letter i Approval Letter ii Abstract in Chinese iii Abstract in English iv Acknowledgements v Contents vi List of Figures viii List of Tables x List of Algorithms xii 1 Introduction 1 2 Related Work 5 2.1 Graph Convolution Network 5 2.2 Consistency Regularization 6 2.3 Self­-training 7 3. Methodology 9 3.1 Problem Formulation 9 3.2 Incomplete Skeleton Data 11 3.3 Consistency Regularization 13 3.4 Pseudo-­Labeling Semi-­Supervised Learning 14 4 Experiment 18 4.1 Datasets 18 4.2 Implement Detailed 20 4.2.1 Network Setting 20 4.2.2 Data Preprocessing 20 4.3 Comparison with Semi-­Supervised Methods 22 4.4 Ablation Study 25 4.4.1 Effective in Self-­Training 25 4.4.2 Effective of our STRA 26 4.4.3 Effective in incomplete skeleton data 28 5 Conclusion 30 References 31 Letter of Authority 34

    [1] Yong Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action
    recognition,” in CVPR, 2015.
    [2] P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng, “View adaptive neural networks for high
    performance skeleton­based human action recognition,” in IEEE PAMI, 2019.
    [3] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton­based
    action recognition,” in AAAI, 2018.
    [4] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, “Actional­structural graph convolutional
    networks for skeleton­based action recognition,” in CVPR, 2019.
    [5] P. Zhang, C. Lan, W. Zeng, J. Xue, and N. Zheng, “Semantics­guided neural networks for efficient
    skeleton­based human action recognition,” in CVPR, 2019.
    [6] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two­stream adaptive graph convolutional networks for
    skeleton­based action recognition,” in CVPR, 2019.
    [7] K. Matthew and L. Xin, “Ddgcn: A dynamic directed graph convolutional network for action recognition,” in ECCV, 2020.
    [8] Y.­F. Song, Z. Zhang, C. Shan, and L. Wang, “Stronger, faster and more explainable: A graph convolutional baseline for skeleton­based action recognition,” in ACM Multimedia, 2020.
    [9] Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph convolutions
    for skeleton­based action recognition,” in CVPR, 2020.
    [10] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, “Symbiotic graph neural networks for 3d
    skeleton­based human action recognition and motion prediction,” in IEEE PAMI, 2021.
    [11] W. Peng, X. Hong, and G. Zhao, “Tripool: Graph triplet pooling for 3d skeleton­based action recognition,” in Pattern Recognition, 2021.
    [12] W. Peng, J. Shi, and G. Zhao, “Spatial temporal graph deconvolutional network for skeleton­based
    human action recognition,” in IEEE Signal Processing Letters, 2021.
    [13] J. Xie, W. Xin, R. Liu, L. Sheng, X. Liu, X. Gao, S. Zhong, L. Tang, and Q. Miao, “Cross­channel
    graph convolutional networks for skeleton­based action recognition,” in IEEE Access, 2021.
    [14] N. Heidari and A. Iosifidis, “Temporal attention­augmented graph convolutional network for efficient
    skeleton­based human action recognition,” in ICPR, 2021.
    [15] S. Chen, K. Xu, X. Jiang, and T. Sun, “Spatiotemporal­spectral graph convolutional networks for
    skeleton-­based action recognition,” in ICMEW, 2021.
    31
    [16] Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh, “Openpose: Realtime multi­-person
    2d pose estimation using part affinity fields,” in IEEE PAMI, 2019.
    [17] A. Shahroudy, J. Liu, T. Ng, and G. Wang, “Ntu rgb+d: A large scale dataset for 3d human activity
    analysis,” in CVPR, 2016.
    [18] C. Si, W. Chen, W. Wang, L. Wang, and T. Tan, “An attention enhanced graph convolutional lstm
    network for skeleton­-based action recognition,” in CVPR, 2019.
    [19] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two­-stream adaptive graph convolutional networks for
    skeleton-­based action recognition,” in CVPR, 2019.
    [20] O. Chapelle, B. Scholkopf, and A. Zien, Eds., “Semi­-supervised learning (chapelle, o. et al., eds.;
    2006) [book reviews],” 2009.
    [21] S. P. Sahoo, S. Ari, and U. Srinivasu, “3d features for human action recognition with semi-­supervised
    learning,” in IET Image Processing, 2019.
    [22] M. F. Mabrouk, N. M. Ghanem, and M. A. Ismail, “Semi supervised learning for human activity
    recognition using depth cameras,” in ICMLA, 2015.
    [23] S. Wang, Z. Ma, Y. Yang, X. Li, C. Pang, and A. G. Hauptmann, “Semi­supervised multiple feature
    analysis for action recognition,” in IEEE Transactions on Multimedia, 2014.
    [24] A. Singh, O. Chakraborty, A. Varshney, R. Panda, R. Feris, K. Saenko, and A. Das, “Semi­-supervised
    action recognition with temporal contrastive learning,” in CVPR, 2021.
    [25] D. Lee, “Pseudo-­label : The simple and efficient semi­-supervised learning method for deep neural
    networks,” in ICML Workshop on Challenges in Representation Learning, 2013.
    [26] P. Cascante­Bonilla, F. Tan, Y. Qi, and V. Ordonez, “Curriculum labeling: Revisiting pseudo-­labeling
    for semi­-supervised learning,” in NeuraIPS, 2021.
    [27] K. Sohn, D. Berthelot, C.­L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A. Kurakin, H. Zhang, and C. Raffel, “Fixmatch: Simplifying semi­-supervised learning with consistency and confidence,” in NeuraIPS,
    2020.
    [28] J. wang, X. Nie, Y. Xia, Y. Wu, and S.­C. Zhu, “Cross­view action modeling, learning and recognition,” in CVPR, 2014.
    [29] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high­resolution representation learning for human pose
    estimation,” in CVPR, 2019.
    [30] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake,
    “Real­time human pose recognition in parts from single depth images,” in CVPR, 2011.
    [31] C. Si, Y. Jing, W. Wang, L. Wang, and T. Tan, “Skeleton-­based action recognition with spatial reasoning and temporal stack learning,” in ECCV, 2018.
    32
    [32] P. Bachman, O. Alsharif, and D. Precup, “Learning with pseudo-­ensembles,” in NeurIPS, 2014.
    [33] S. Laine and T. Aila, “Temporal ensembling for semi­-supervised learning.,” in ICLR, 2017.
    [34] M. Sajjadi, M. Javanmardi, and T. Tasdizen, “Regularization with stochastic transformations and perturbations for deep semi-­supervised learning,” in NeurIPS, 2016.
    [35] C. Wei, K. Shen, Y. Chen, and T. Ma, “Theoretical analysis of self­-training with deep networks on
    unlabeled data,” in ICLR, 2021.
    [36] W. Shi, Y. Gong, C. Ding, Z. M. Tao, and N. Zheng, “Transductive semi­supervised deep learning
    using min­max features,” in ECCV, 2018.
    [37] A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Label propagation for deep semi-­supervised learning,”
    in CVPR, 2019.
    [38] E. Arazo, D. Ortego, P. Albert, N. E. O’Connor, and K. McGuinness, “Pseudo­labeling and confirmation bias in deep semi-­supervised learning,” in ICLR, 2020.
    [39] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez­Paz, “mixup: Beyond empirical risk minimization,”
    in ICLR, 2018.
    [40] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in ICML, 2009.
    [41] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in ICLR, 2019.
    [42] J. Liu, A. Shahroudy, D. Xu, and G. Wang, “Spatio­temporal lstm with trust gates for 3d human action
    recognition,” in ECCV, 2016.
    [43] T. Miyato, S. ichi Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: A regularization
    method for supervised and semi-­supervised learning,” in IEEE PAMI, 2018.
    [44] X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self-­supervised semi­supervised learning,” in
    ICCV, 2019.
    [45] C. Si, X. Nie, W. Wang, L. Wang, T. Tan, and J. Feng, “Adversarial self­supervised learning for semi-supervised 3d action recognition,” in ECCV, 2020.
    [46] L. Lin, S. Song, W. Yang, and J. Liu, “Ms2l: Multi­task self­-supervised learning for skeleton based
    action recognition,” in ACM Multimedia, 2020.

    無法下載圖示 Full text public date 2026/08/23 (Intranet public)
    Full text public date 2026/08/23 (Internet public)
    Full text public date 2026/08/23 (National library)
    QR CODE