結合語意分割和姿態預估技術應用於手部偵測之研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	童暘修 Yang-Hsiu Tung
論文名稱：	結合語意分割和姿態預估技術應用於手部偵測之研究 Combination of Semantic Segmentation and Pose Estimation for Human Hands Detection
指導教授：	花凱龍 Kai-Lung Hua 楊朝龍 Chao-Lung Yang
口試委員:	楊朝龍 Chao-Lung Yang 花凱龍 Kai-Lung Hua 沈上翔 Shan-Hsiang Shen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	31
中文關鍵詞：	人機協作、語意分割、人體姿態預估、人體首部偵測
外文關鍵詞：	Human-robot collaboration, Semantic segmentation, Human pose estimation, Human part detection
相關次數：	點閱：183 下載：7
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在人機協作過程中，識別人體手部位置相當重要，因機具向人類運送物件或避免碰撞發生。本研究目的為操作人員處理生產任務時，在蒐集的資料當中準確定位出人體手部部位。透過 MediaPipe 或Detectron2 等姿勢預估模型雖然可以推論人體骨架，但無法劃分出人體部位的完整輪廓，特別是手掌和手指部位。此外，現有的人體部位分割模型（例如 SCHP）需要大量的計算資源才能達到高精度要求，同時輕量化模型(例如 BodyPix)則無法在有限的資源環境下達到精度的要求。為了解決這個問題，本研究提出了一種混合框架，將語義分割與姿態預估結合，獲取感興趣區域中的像素級資訊與細節，以有效的方式實現準確性。實驗結果表明，所提出的混合模型在實驗資料上優於現有的輕量級模型，同時保持穩定的資源使用。對於未來的研究，建議擴大各種混合模型的應用，並對人體不同部位進行進一步深入的分析。通過這些擴展研究，可以進一步提高人體部位定位的準確性和效率。

Recognizing the human hands’ positions is crucial during human-robot collaboration because the robot might need to deliver objects to humans or avoid a collision. The aim of this study is to accurately position human hands in the collected video on a basis when a human operator processes the manufacturing tasks. Although the pose estimation methods such as MediaPipe or Detectron2 can recognize the skeletons of the human body, they cannot detect the human body contour, particularly for palms and fingers. Besides, the existing human hand positioning models such as SCHP require substantial computational resources to achieve high accuracy. In contrast, lightweight model as BodyPix cannot achieve the accuracy requirements in resource constrained environments. To tackle this issue, a hybrid framework was proposed to combine semantic segmentation and pose estimation, obtaining pixel-level information in regions of interest (ROI) to achieve accuracy in a cost-efficient way. The experimental results show that the proposed hybrid model outperforms the existing lightweight model on the simulated data with maintaining stable resource usage. For future research, it is suggested to expand the application of various hybrid models and to conduct a further in-depth analysis of different human body parts. Through these extended investigations, it is anticipated that the accuracy and efficiency of human body part positioning can be further improved.

Table of Contents
Chinese Abstract . . . . . . . . . . . . . . . . . . . . . . 1
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Table of Contents . . . . . . . . . . . . . . . . . . . . . .4
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . 6
List of Illustrations . . . . . . . . . . . . . . . . . . . . . 7
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 9
Related Work . . . . . . . . . . . . . . . . . . . . . . . .11
1 Related Techniques . . . . . . . . . . . . . . . . . 11
1.1 Image Segmentation . . . . . . . . . . . . . . . 11
1.2 Semantic Segmentation . . . . . . . . . . . . 12
1.3 Pose Estimation . . . . . . . . . . . . . . . . . . .13
2 Review Paper . . . . . . . . . . . . . . . . . . . . . . 13
3 Human Parsing . . . . . . . . . . . . . . . . . . . . .14
4 Region of Interest (ROI) . . . . . . . . . . . . . .15
Methodology . . . . . . . . . . . . . . . . . . . . . . . .16
1 Framework . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1 Pose Estimation . . . . . . . . . . . . . . . . . . . 17
1.2 Semantic Segmentation . . . . . . . . . . . . 19
1.3 Region of Interest(ROI) . . . . . . . . . . . . . 19
1.4 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
1.5 Overlap . . . . . . . . . . . . . . . . . . . . . . . . . .26
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . .28
1 Experimental Datasets . . . . . . . . . . . . . . . 28
2 Metric / Performance . . . . . . . . . . . . . . . . 29
3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
4 Ablation Experiment . . . . . . . . . . . . . . . . . 32
5 Experimental Summary . . . . . . . . . . . . . . .33
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Future Research . . . . . . . . . . . . . . . . . . . . .36
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
                                

References
[1] O. I. d. Normalización, “So-ts 15066: Robots and robotic devices: Collaborative robots,” ISO, 2016.
[2] “Iso 10218: Robots and robotic devices —safety requirements for industrial robots,” ISO, 2021.
[3] R. Yang and Y. Yu, “Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis,” Frontiers in oncology, vol. 11, p. 638182, 2021.
[4] J. Dong, Q. Chen, S. Yan, and A. Yuille, “Towards unified object detection and semantic segmentation,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 299–314.
[5] X. Li, M. Kan, S. Shan, and X. Chen, “Weakly supervised object detection with segmentation collaboration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
[6] S. Hao, Y. Zhou, and Y. Guo, “A brief survey on semantic segmentation with deep learning,” Neurocomputing, vol. 406, pp. 302–321, 2020.
[7] A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9404–9413.
[8] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3523–3542, 2021.
[9] Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: A survey of deep learning-based methods,” Computer Vision and Image Understanding, vol. 192, p. 102897, 2020.
[10] L. Yang, W. Jia, S. Li, and Q. Song, “Deep learning technique for human parsing: A survey and outlook,” arXiv preprint arXiv:2301.00394, 2023.
[11] F. Xia, P. Wang, X. Chen, and A. L. Yuille, “Joint multi-person pose estimation and semantic part segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6769–6778.
[12] G. Papandreou, T. Zhu, L.-C. Chen, S. Gidaris, J. Tompson, and K. Murphy, “Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 269–286.
[13] W. Wang, T. Zhou, S. Qi, J. Shen, and S.-C. Zhu, “Hierarchical human semantic parsing with comprehensive part-relation modeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3508–3522, 2021.
[14] H.-S. Fang, G. Lu, X. Fang, J. Xie, Y.-W. Tai, and C. Lu, “Weakly and semi supervised human body part parsing via pose-guided knowledge transfer,” arXiv preprint arXiv:1805.04310, 2018.
[15] K. Gong, X. Liang, D. Zhang, X. Shen, and L. Lin, “Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 932–940.
[16] X. Nie, J. Feng, and S. Yan, “Mutual learning to adapt for joint human parsing and pose estimation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 502–517.
[17] S. Zhang, X. Cao, G.-J. Qi, Z. Song, and J. Zhou, “Aiparsing: Anchor-free instance-level human parsing,” IEEE Transactions on Image Processing, vol. 31, pp. 5599–5612, 2022.
[18] K. Liu, O. Choi, J. Wang, and W. Hwang, “Cdgnet: Class distribution guided network for human parsing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4473–4482.
[19] L. Rossi, A. Karimi, and A. Prati, “A novel region of interest extraction layer for instance segmentation,” in 2020 25th international conference on pattern recognition (ICPR). IEEE, 2021, pp. 2203–2209.
[20] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee et al., “Mediapipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019.
[21] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2,” https://github.com/facebookresearch/detectron2, 2019.
[22] D. Montes, P. Peerapatanapokin, J. Schultz, C. Guo, W. Jiang, and J. C. Davis, “Discrepancies among pre-trained deep neural networks: a new threat to model zoo reliability,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1605–1609.
[23] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
[24] M. Rhif, A. Ben Abbes, I. R. Farah, B. Martínez, and Y. Sang, “Wavelet transform application for/in non-stationary time-series analysis: a review,” Applied Sciences, vol. 9, no. 7, p. 1345, 2019.
[25] T. Zhu and D. Oved, “Bodypix-person segmentation in the browser,” 2019.

全文公開日期 2043/07/11 (校外網路)
全文公開日期 2043/07/11 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文