簡易檢索 / 詳目顯示

研究生: 范俊新
Julius Sintara
論文名稱: 用於小數據集以骨架為基礎影像辨識勞工活動框架
Vision-Based Worker Activity Recognition Framework using Skeleton-Based Approach on Small Datasets
指導教授: 周碩彥
Shuo-Yan Chou
郭伯勳
Po-Hsun Kuo
口試委員: 周碩彥
Shuo-Yan Chou
郭伯勳
Po-Hsun Kuo
羅士哲
Shih-Che Lo
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 52
外文關鍵詞: Work-activity Recognition, Skeleton-based, Real-time Inference, Multi-person Activity Recognition, Small Dataset
相關次數: 點閱:134下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • Human activity recognition has been gaining significant attention in recent years, especially in industry, as it is considered a key enabler for better human-machine interaction. Developing a well-performed activity recognition algorithm requires an adequate dataset to train the model. However, in the industrial case, some actions and activities are specific. Thus, such a particular case dataset is very limited in quantity, scenario, environment, and condition variations, making it only available as a small dataset. In the context of activity recognition system applications, the domain of train data will also likely differ from the real-time application, as the system will be applied in varying environments and conditions. The small size of the dataset leads to insufficient representation of all possible cases, making standard deep-learning methods not robust. Therefore, this research proposes an activity recognition framework using a skeleton-based approach to address this challenge.
    The proposed framework is designed to be robust to be applied in different domains from training datasets, flexible to any specific activity or action in the industry even with a small and homogenous dataset, scalable for multi-person recognition despite being trained on a single-person dataset, and deployable for real-time inference in industrial settings. The framework is based on three key steps: human detection, human pose estimation, and action detection. The proposed framework was experimentally evaluated on test data with varying domains, including changes in the background, lighting, outfit color, and anthropomorphic factors. This research also contributes to introducing a method for the reliability of an activity recognition model in domain shift.
    This research applies the proposed framework to worker activity recognition in the industry. The result demonstrates that the proposed framework is able to achieve high recognition accuracy even in the presence of these variations, thus making it robust to domain shift. Additionally, the framework can be run in real-time inference, allowing for online video recognition.
    Keywords: worker activity recognition, skeleton-based, domain shift, real-time inference, multi-person activity recognition, small dataset

    TABLE OF CONTENTS ABSTRACT i TABLE OF CONTENTS ii LIST OF FIGURES iv LIST OF TABLES v CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Challenge and Issue 2 1.3 Research Objective and Contribution 3 1.4 Organization of the Research 4 CHAPTER 2 LITERATURE REVIEW 5 2.1 Worker Activity Recognition Overview 5 2.2 Vision-based Activity Recognition 6 2.2.1 Skeleton-based 7 2.2.1.1 Temporal Convolutional Network (TCN) for skeleton-based 7 2.2.1.2 Graph Convolutional Network (GCN) for skeleton-based 8 2.2.1.3 Convolutional Neural Network (CNN) for skeleton-based 9 2.2.2 Object detection 9 2.2.3 Human pose estimator 11 2.3 Action datasets 12 2.3.1 UCF101 13 2.3.2 HMDB51 13 2.3.3 ActivityNet 13 2.3.4 Kinetics 400 14 2.3.5 Atomic Visual Actions 14 2.3.6 Small Dataset 14 CHAPTER 3 METHODOLOGY 16 3.1 Training phase 16 3.1.1 Dataset preprocessing 17 3.1.2 Human Detection 17 3.1.3 Human Pose Estimation 18 3.1.4 Heatmap reconstruction 20 3.1.5 Action detection 21 3.2 Inference phase 21 3.2.1 Object tracking 23 3.2.2 Sliding window 23 3.2.3 Concurrent Processing 24 CHAPTER 4 EXPERIMENTS AND DISCUSSION 26 4.1 Hardware configuration 26 4.2 Dataset description 26 4.3 Model Selection 30 4.4 Training parameters 30 4.5 Experimental results 31 4.5.1 Comparison of different pre-trained models 31 4.5.2 Comparison to other models 32 4.6 Multi-object real-time inference 32 CHAPTER 5 CONCLUSION AND FUTURE RESEARCH 34 5.1 Conclusion 34 5.2 Future Research 35 REFERENCES 37

    REFERENCES

    [1] S. Vaidya, P. Ambad, and S. Bhosle, “Industry 4.0 - A Glimpse,” in Procedia Manufacturing, 2018, vol. 20, pp. 233–238. doi: 10.1016/j.promfg.2018.02.034.
    [2] B. Wang, Y. Xue, J. Yan, X. Yang, and Y. Zhou, “Human-Centered Intelligent Manufacturing: Overview and Perspectives,” Chinese Journal of Engineering Science, vol. 22, no. 4, p. 139, 2020, doi: 10.15302/j-sscae-2020.04.020.
    [3] P. Fratczak, Y. M. Goh, P. Kinnell, A. Soltoggio, and L. Justham, “Understanding human behaviour in industrial human-robot interaction by means of virtual reality,” ACM International Conference Proceeding Series, Nov. 2019, doi: 10.1145/3363384.3363403.
    [4] H. Gammulle, D. Ahmedt-Aristizabal, S. Denman, L. Tychsen-Smith, L. Petersson, and C. Fookes, “Continuous Human Action Recognition for Human-Machine Interaction: A Review,” Feb. 2022, doi: 10.48550/arxiv.2202.13096.
    [5] M. Vrigkas, C. Nikou, and I. A. Kakadiaris, “A review of human activity recognition methods,” Frontiers Robotics AI, vol. 2, no. NOV, p. 28, Nov. 2015, doi: 10.3389/FROBT.2015.00028/BIBTEX.
    [6] M. Ramanathan, W. Y. Yau, and E. K. Teoh, “Human action recognition with video data: Research and evaluation challenges,” IEEE Trans Hum Mach Syst, vol. 44, no. 5, pp. 650–663, Oct. 2014, doi: 10.1109/THMS.2014.2325871.
    [7] E. Spyrou, E. Mathe, G. Pikramenos, K. Kechagias, and P. Mylonas, “Data Augmentation vs. Domain Adaptation—A Case Study in Human Activity Recognition,” Technologies 2020, Vol. 8, Page 55, vol. 8, no. 4, p. 55, Oct. 2020, doi: 10.3390/TECHNOLOGIES8040055.
    [8] F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, “ActivityNet: A large-scale video benchmark for human activity understanding,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June-2015, pp. 961–970, Oct. 2015, doi: 10.1109/CVPR.2015.7298698.
    [9] C. Gu et al., “AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions”.
    [10] K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild,” Dec. 2012, doi: 10.48550/arxiv.1212.0402.
    [11] M. Monfort et al., “Moments in Time Dataset: One Million Videos for Event Understanding,” IEEE Trans Pattern Anal Mach Intell, vol. 42, no. 2, pp. 502–508, Feb. 2020, doi: 10.1109/TPAMI.2019.2901464.
    [12] G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta, “Hollywood in homes: Crowdsourcing data collection for activity understanding,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 510–526, 2016, doi: 10.1007/978-3-319-46448-0_31/FIGURES/8.
    [13] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: A large video database for human motion recognition,” Proceedings of the IEEE International Conference on Computer Vision, pp. 2556–2563, 2011, doi: 10.1109/ICCV.2011.6126543.
    [14] J. Carreira, A. Zisserman, and Z. Com, “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset”.
    [15] Y. Xu, H. Cao, K. Mao, Z. Chen, L. Xie, and J. Yang, “Aligning Correlation Information for Domain Adaptation in Action Recognition,” IEEE Trans Neural Netw Learn Syst, 2022, doi: 10.1109/TNNLS.2022.3212909.
    [16] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks”.
    [17] C. Feichtenhofer, “X3D: Expanding Architectures for Efficient Video Recognition”.
    [18] C. Feichtenhofer, H. Fan, J. Malik, and K. He, “SlowFast Networks for Video Recognition”, Accessed: Jan. 08, 2023. [Online]. Available: https://github.com/
    [19] H. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai, “Revisiting Skeleton-based Action Recognition,” pp. 2959–2968, Sep. 2022, doi: 10.1109/CVPR52688.2022.00298.
    [20] Z. Shou, D. Wang, and S.-F. Chang, “Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs”.
    [21] M. Yang, G. Chen, Y.-D. Zheng, T. Lu, and L. Wang, “BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection,” May 2022, doi: 10.48550/arxiv.2205.02717.
    [22] C.-L. Zhang, J. Wu, and Y. Li, “ActionFormer: Localizing Moments of Actions with Transformers,” pp. 492–510, 2022, doi: 10.1007/978-3-031-19772-7_29.
    [23] W. L. Pan and S. Y. Chou, “Untrimmed Operator Standard Cleaning Action Parsing Based on Deep Learning Method,” 2021 IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2021, pp. 1338–1342, 2021, doi: 10.1109/IEEM50564.2021.9672608.
    [24] J. Yang, Z. Shi, and Z. Wu, “Vision-based action recognition of construction workers using dense trajectories,” Advanced Engineering Informatics, vol. 30, no. 3, pp. 327–336, Aug. 2016, doi: 10.1016/J.AEI.2016.04.009.
    [25] W. Tao, Z. H. Lai, M. C. Leu, and Z. Yin, “Worker Activity Recognition in Smart Manufacturing Using IMU and sEMG Signals with Convolutional Neural Networks,” Procedia Manuf, vol. 26, pp. 1159–1166, Jan. 2018, doi: 10.1016/J.PROMFG.2018.07.152.
    [26] W. Tao, M. C. Leu, and Z. Yin, “Multi-modal recognition of worker activity for human-centered intelligent manufacturing,” Eng Appl Artif Intell, vol. 95, p. 103868, Oct. 2020, doi: 10.1016/J.ENGAPPAI.2020.103868.
    [27] B. Hartmann, “Human Worker Activity Recognition in Industrial Environments”.
    [28] Z. Sun, Q. Ke, H. Rahmani, M. Bennamoun, G. Wang, and J. Liu, “Human Action Recognition from Various Data Modalities: A Review,” IEEE Trans Pattern Anal Mach Intell, Dec. 2020, doi: 10.1109/tpami.2022.3183112.
    [29] Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June-2015, pp. 1110–1118, Oct. 2015, doi: 10.1109/CVPR.2015.7298714.
    [30] C. Li, Q. Zhong, D. Xie, and S. Pu, “Skeleton-based Action Recognition with Convolutional Neural Networks,” IEEE Signal Process Lett, vol. 24, no. 5, pp. 624–628, Apr. 2017, doi: 10.1109/LSP.2017.2678539.
    [31] M. Nan and A. M. Florea, “Fast Temporal Graph Convolutional Model for Skeleton-Based Action Recognition,” Sensors (Basel), vol. 22, no. 19, Oct. 2022, doi: 10.3390/S22197117.
    [32] K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu, “Skeleton-based action recognition with shift graph convolutional network,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 180–189, 2020, doi: 10.1109/CVPR42600.2020.00026.
    [33] T. S. Kim and A. Reiter, “Interpretable 3D Human Action Analysis with Temporal Convolutional Networks,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2017-July, pp. 1623–1631, Apr. 2017, doi: 10.48550/arxiv.1704.04516.
    [34] J. Hou, G. Wang, X. Chen, J. H. Xue, R. Zhu, and H. Yang, “Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11134 LNCS, pp. 273–286, 2019, doi: 10.1007/978-3-030-11024-6_18/FIGURES/7.
    [35] M. Nan, M. Trăscău, A. M. Florea, and C. C. Iacob, “Comparison between Recurrent Networks and Temporal Convolutional Networks Approaches for Skeleton-Based Action Recognition,” Sensors 2021, Vol. 21, Page 2051, vol. 21, no. 6, p. 2051, Mar. 2021, doi: 10.3390/S21062051.
    [36] C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Temporal Convolutional Networks for Action Segmentation and Detection”, Accessed: Jan. 16, 2023. [Online]. Available: https://github.com/colincsl/
    [37] Y. A. Farha and J. Gall, “MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation”.
    [38] S. Li, Y. A. Farha, Y. Liu, M.-M. Cheng, and J. Gall, “MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation”.
    [39] J. Cai, N. Jiang, X. Han, K. Jia, and J. Lu, “JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition”.
    [40] Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition”, Accessed: Jan. 16, 2023. [Online]. Available: https://github.com/Uason-Chen/CTR-GCN.
    [41] Y. F. Song, Z. Zhang, C. Shan, and L. Wang, “Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-based Action Recognition,” MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, pp. 1625–1633, Oct. 2020, doi: 10.1145/3394171.3413802.
    [42] Y. F. Song, Z. Zhang, C. Shan, and L. Wang, “Constructing Stronger and Faster Baselines for Skeleton-Based Action Recognition,” IEEE Trans Pattern Anal Mach Intell, vol. 45, no. 02, pp. 1474–1488, Feb. 2023, doi: 10.1109/TPAMI.2022.3157033.
    [43] S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 7444–7452, Jan. 2018, doi: 10.48550/arxiv.1801.07455.
    [44] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, “Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition”.
    [45] Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition”.
    [46] B. Li, X. Li, Z. Zhang, and F. Wu, “Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8561–8568, Jul. 2019, doi: 10.1609/AAAI.V33I01.33018561.
    [47] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition”, Accessed: Jan. 16, 2023. [Online]. Available: https://github.com/lshiwjx/2s-AGCN
    [48] R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recognition by representing 3D skeletons as points in a lie group,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 588–595, Sep. 2014, doi: 10.1109/CVPR.2014.82.
    [49] K. K. Sung and T. Poggio, “Example-based learning for view-based human face detection,” IEEE Trans Pattern Anal Mach Intell, vol. 20, no. 1, pp. 39–51, 1998, doi: 10.1109/34.655648.
    [50] P. Dollár, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: An evaluation of the state of the art,” IEEE Trans Pattern Anal Mach Intell, vol. 34, no. 4, pp. 743–761, 2012, doi: 10.1109/TPAMI.2011.155.
    [51] M. P. Sampat and A. C. Bovik, “Detection of Spiculated Lesions in Mammograms,” Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings, vol. 1, pp. 810–813, 2003, doi: 10.1109/IEMBS.2003.1279888.
    [52] A. Dundar, J. Jin, B. Martini, and E. Culurciello, “Embedded streaming deep neural networks accelerator with applications,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 7, pp. 1572–1583, Jul. 2017, doi: 10.1109/TNNLS.2016.2545298.
    [53] Y. Xiao et al., “A review of object detection based on deep learning,” Multimed Tools Appl, vol. 79, no. 33–34, pp. 23729–23791, Sep. 2020, doi: 10.1007/S11042-020-08976-6/FIGURES/10.
    [54] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587, Nov. 2013, doi: 10.48550/arxiv.1311.2524.
    [55] R. Girshick, “Fast R-CNN.” [Online]. Available: https://github.com/rbgirshick/
    [56] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 6, pp. 1137–1149, Jun. 2015, doi: 10.48550/arxiv.1506.01497.
    [57] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 779–788, Jun. 2015, doi: 10.48550/arxiv.1506.02640.
    [58] W. Liu et al., “SSD: Single Shot MultiBox Detector,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 21–37, Dec. 2015, doi: 10.1007/978-3-319-46448-0_2.
    [59] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO Series in 2021,” vol. 5, p. 12, Jul. 2021, doi: 10.48550/arxiv.2107.08430.
    [60] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 770–778, Dec. 2015, doi: 10.48550/arxiv.1512.03385.
    [61] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2014, doi: 10.48550/arxiv.1409.1556.
    [62] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated Residual Transformations for Deep Neural Networks,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 5987–5995, Nov. 2016, doi: 10.48550/arxiv.1611.05431.
    [63] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” Dec. 2016, doi: 10.48550/arxiv.1612.03144.
    [64] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8759–8768, Mar. 2018, doi: 10.48550/arxiv.1803.01534.
    [65] N. U. Khan and W. Wan, “A Review of Human Pose Estimation from Single Image,” ICALIP 2018 - 6th International Conference on Audio, Language and Image Processing, pp. 230–236, Sep. 2018, doi: 10.1109/ICALIP.2018.8455796.
    [66] Y. Chen, Y. Tian, and M. He, “Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods,” Computer Vision and Image Understanding, vol. 192, Jun. 2020, doi: 10.1016/j.cviu.2019.102897.
    [67] P. Parekh and A. Patel, “Deep Learning-Based 2D and 3D Human Pose Estimation: A Survey,” Lecture Notes in Networks and Systems, vol. 203 LNNS, pp. 541–556, 2021, doi: 10.1007/978-981-16-0733-2_38/TABLES/10.
    [68] L. Xu et al., “ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 16067–16076, May 2021, doi: 10.48550/arxiv.2105.10154.
    [69] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep High-Resolution Representation Learning for Human Pose Estimation.” pp. 5693–5703, 2019. Accessed: Dec. 27, 2022. [Online]. Available: https://github.com/leoxiaobin/
    [70] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, “DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9910 LNCS, pp. 34–50, May 2016, doi: 10.48550/arxiv.1605.03170.
    [71] Z. Geng, K. Sun, Z. Zhang, and J. Wang, “Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression”.
    [72] B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang, and L. Zhang, “HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5385–5394, Aug. 2019, doi: 10.48550/arxiv.1908.10357.
    [73] W. Kay et al., “The Kinetics Human Action Video Dataset,” May 2017, doi: 10.48550/arxiv.1705.06950.
    [74] L. Brigato and L. Iocchi, “A Close Look at Deep Learning with Small Data,” Proceedings - International Conference on Pattern Recognition, pp. 2490–2497, Mar. 2020, doi: 10.48550/arxiv.2003.12843.
    [75] R. Richard, S.-Y. Chou, A. Dewabharata, R. Fajar, and R. A. Hendrawan, “Real-Time Human Activity Recognition using Dynamic Sliding Window via CNN and GRU,” 2019.
    [76] Mma. Contributors, “OpenMMLab’s Next Generation Video Understanding Toolbox and Benchmark.” 2020.
    [77] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: 10.1109/TPAMI.2016.2577031.
    [78] T. Y. Lin et al., “Microsoft COCO: Common objects in context,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8693 LNCS, no. PART 5, pp. 740–755, 2014, doi: 10.1007/978-3-319-10602-1_48/COVER.
    [79] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 5686–5696, Jun. 2019, doi: 10.1109/CVPR.2019.00584.
    [80] D. Tran, H. Wang, L. Torresani, J. Ray, Y. Lecun, and M. Paluri, “A Closer Look at Spatiotemporal Convolutions for Action Recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6450–6459, Dec. 2018, doi: 10.1109/CVPR.2018.00675.

    無法下載圖示 全文公開日期 2026/02/14 (校內網路)
    全文公開日期 2026/02/14 (校外網路)
    全文公開日期 2026/02/14 (國家圖書館:臺灣博碩士論文系統)
    QR CODE