簡易檢索 / 詳目顯示

研究生: 劉冠廷
Guan-Ting Liu
論文名稱: 應用於工業4.0之輕量化及可轉移手部偵測
Lightweight and Transferrable Hand-detection for Industry 4.0
指導教授: 陸敬互
Ching-Hu Lu
口試委員: 郭重顯
Chung-Hsien Kuo
陸敬互
Ching-Hu Lu
鍾聖倫
Sheng-Luen Chung
徐繼聖
Gee-Sern Hsu
蘇順豐
Shun-Feng Su
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 90
中文關鍵詞: 工業4.0智慧工廠手部偵測無錨框輕量化神經模型邊緣運算物聯網知識轉移
外文關鍵詞: industry 4.0, smart factory, deep learning, anchor-free lightweight deep neural network, edge computing, Internet of Things, knowledge transfer
相關次數: 點閱:301下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來工業4.0 (Industry 4.0) 透過結合5G與人工智慧等技術來改善製造上的問題。在工業4.0的智慧工廠中,經常面臨到商機改變,工廠生產線需要提升具個性化產品生產速度,故需要容許不同的產品的組裝流程,且後續檢查工作非常嚴謹,以確保產品能正確組裝。近年來,善用邊緣運算的智慧攝影機 (以下稱為邊緣攝影機),其結合深度網路等技術來實現人工智慧物聯網 (AIoT),已被廣泛應於不同智慧領域,當然也包含智慧生產。由於邊緣攝影機之運算能力有限,當智慧工廠需要提供彈性且實時的偵測任務,邊緣攝影機卻無法運算複雜龐大的深度網路。因此,本研究針對智慧工廠彈性組裝的場域提出「無錨框輕量級手部偵測模型」,可直接執行於邊緣攝影機中,用於偵測組裝人員的手部位置,並於使用者介面程式檢查組裝人員組裝之步驟之正確性。此外,因應智慧工廠面臨小批量、多樣化個性化產品的生產,既有研究尚未在手部辨識上著墨需快速學習與佈署深度網路之需求,本研究再提出「跨組裝線手部偵測模型之知識轉移」來將既有習得知識更快速地轉移到其他目標場域。實驗結果顯示,「無錨框輕量級手部偵測模型」在Oxford資料集上,模型推理速度比既有之研究提升約42.15倍,準確率mAP提升2.16%,AP75提升8.81%;而在Egohands資料集上,模型之推理速度提升約40.47倍,準確率mAP提升17.53%, AP75提升27.41%。而在邊緣攝影機之模型推理速度測試上,在Oxford資料集上,可以比既有研究提升約2.68倍。而「跨組裝線手部偵測模型之知識轉移」之模型預測精度相較於沒有轉移學習者提升8%,並且可以節省約80%之訓練時間。因此,本研究所提出輕量化手部偵測模型除了可有效運行於資源有限之邊緣攝影機上,並保有優異的預測精度。另外透過知識轉移,可以提供更優異的偵測模型並有效地節省模型訓練的時間,以利跨場域應用的快速上線。


    An Industry 4.0 smart factory often needs to produce a variety of products, so its assemblers need to learn about different assembly processes, which need rigorous post-inspection to ensure product quality. Smart cameras that leverage edge computing (hereinafter referred to as edge cameras) can now incorporate technologies such as deep neural networks to realize the Artificial Intelligence Internet of Things (AIoT), and have been widely used in smart factories. Due to its limited computing power, an edge camera fails to compute complex deep neural networks, which are now often used by smart factories to provide real-time detection tasks. Therefore, in the field of assembly-process verification for a smart factory, we propose a "Lightweight Anchor-Free Hand Detection Model (LAFHDM)," and its deep network can directly fit into an edge camera to detect hand positions of an assembler so that the correctness of the assembler's assembly steps can be readily verified. In addition, in response to the demand for rapid learning and deployment of deep neural networks across different assembly lines, which has not been addressed in the existing studies, we propose "Transferrable Knowledge across Multiple Assembly Lines (TKaMAL)" to transfer learned knowledge across different domains. The experimental results show that the inference speed of the LAFHDM is 42.15 times faster, the accuracy of mAP is 2.16% higher, and the AP75 is 8.81% higher on the Oxford dataset; the inference speed is 40.47 times faster, the mAP is 17.53% higher, and the AP75 is 27.41% higher on the Egohands dataset. The inference speed on an edge camera can be improved by 2.68 times compared to the existing studies using the Oxford dataset. Finally, the model prediction accuracy of the TKaMAL was improved by 8% when comparing with that of non-transfer-learning based approaches, and the training time could be significantly saved by about 80%.

    中文摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VI 表格目錄 VIII 第一章 簡介 1 1.1 研究動機 1 1.2 文獻探討 4 1.2.1 邊緣裝置順暢運行深度手部位置偵測模型 4 1.2.2 跨組裝線手部偵測模型之知識轉移 7 1.3 本研究貢獻與文章架構 8 第二章 系統設計理念與架構簡介 10 2.1 系統應用情境 10 2.2 系統架構 12 第三章 無錨框輕量化手部偵測模型 15 3.1 深度分離卷積運算 15 3.2 通道注意力機制 19 3.3 無錨框手部偵測 21 3.4 優化選擇 28 3.4.1 center-ness 28 3.4.2 真實標籤 (ground truth label) 的建立 29 3.4.3 資料擴增 (Data augmentation) 30 3.5 損失函數 31 3.6 離線優化模型模組及線上優化模型模組 33 3.6.1 離線優化模型模組 34 3.6.2 線上優化模型模組 35 第四章 跨組裝線手部偵測模型之知識轉移以及界面程式 37 4.1 跨組裝線手部偵測模型之知識轉移流程 37 4.2 介面程式 39 第五章 實驗結果與討論 44 5.1 實驗平台 44 5.2 實驗資料集與評估指標 47 5.2.1 實驗資料集 47 5.2.2 評估指標 48 5.3 無錨框輕量化手部偵測模型 50 5.3.1 訓練參數設定 50 5.3.2 網路之模塊對比實驗 51 5.3.3 模組數量實驗 52 5.3.4 相關研究比較 57 5.4 跨組裝線手部偵測模型之知識轉移 66 第六章 結論與未來研究方向 68 參考文獻 69 發表著作與作品列表 75 口試委員之建議與回覆 76

    [1] P. B. Ankita Bhutani. (2019). IP Camera Market Size By Product (Fixed, PTZ, Infrared), By Connection (Centralized, Decentralized), By Application (Residential [Home Security, Smart Home], Commercial [Retail, Industrial, Healthcare, Real Estate], Public/Government [Transportation, BFSI, Education, Government Facilities]), Industry Analysis Report, Regional Outlook, Application Potential, Competitive Market Share & Forecast, 2019 – 2025. Available: https://www.gminsights.com/industry-analysis/ip-camera-market
    [2] J. Vincent. (2018). This Japanese AI security camera shows the future of surveillance will be automated. Available: https://www.theverge.com/2018/6/26/17479068/ai-guardman-security-camera-shoplifter-japan-automated-surveillance#
    [3] J. Liu, G. Wang, P. Hu, L.-Y. Duan, and A. C. Kot, "Global context-aware attention LSTM networks for 3D action recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1647-1656.
    [4] R. Morais, V. Le, T. Tran, B. Saha, M. Mansour, and S. Venkatesh, "Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11996-12004.
    [5] R. Xu et al., "Real-Time Human Objects Tracking for Smart Surveillance at the Edge," in 2018 IEEE International Conference on Communications (ICC), 2018, pp. 1-6.
    [6] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, "A survey of deep neural network architectures and their applications," Neurocomputing, vol. 234, pp. 11-26, 2017/04/19/ 2017.
    [7] C. Dhiman and D. K. Vishwakarma, "A review of state-of-the-art techniques for abnormal human activity recognition," Engineering Applications of Artificial Intelligence, vol. 77, pp. 21-45, 2019/01/01/ 2019.
    [8] D. Reinsel, J. Gantz, and J. Rydning, "Data age 2025: the digitization of the world from edge to core," Seagate https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf, 2018.
    [9] A. Banafa. (2019, November 21). Ten Trends of IoT in 2020. Available: https://iot.ieee.org/newsletter/november-2019/ten-trends-of-iot-in-2020
    [10] D. J. N. A. C. Franklin and P. Forall, "NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge," 2017.
    [11] A. Chuang. (2016, March 07). 工業4.0商機強強滾. Available: https://www.eettaiwan.com/industry4-0-201603071034/
    [12] Z. Shi, Y. Xie, W. Xue, Y. Chen, L. Fu, and X. Xu, "Smart factory in Industry 4.0," vol. 37, no. 4, pp. 607-617, 2020.
    [13] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, "Edge computing: Vision and challenges," IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637-646, 2016.
    [14] R. Buyya and A. V. Dastjerdi, Internet of Things: Principles and paradigms. Elsevier, 2016.
    [15] V. Girondel, L. Bonnaud, and A. J. E. J. o. A. i. S. P. Caplier, "A human body analysis system," vol. 2006, no. 1, p. 061927, 2006.
    [16] A. A. Argyros and M. I. Lourakis, "Real-time tracking of multiple skin-colored objects with a possibly moving camera," in European Conference on Computer Vision, 2004, pp. 368-379: Springer.
    [17] E. Stergiopoulou, K. Sgouropoulos, N. Nikolaou, N. Papamarkos, and N. Mitianoudis, "Real time hand detection in a complex background," Engineering Applications of Artificial Intelligence, vol. 35, pp. 54-70, 2014/10/01/ 2014.
    [18] P. K. Pisharady, P. Vadakkepat, and A. P. Loh, "Attention Based Detection and Recognition of Hand Postures Against Complex Backgrounds," International Journal of Computer Vision, vol. 101, no. 3, pp. 403-419, 2013/02/01 2013.
    [19] J. Guo, J. Cheng, J. Pang, and Y. Guo, "Real-time hand detection based on multi-stage HOG-SVM classifier," in 2013 IEEE International Conference on Image Processing, 2013, pp. 4108-4111: IEEE.
    [20] A. Mittal, A. Zisserman, and P. Torr, "Hand detection using multiple proposals," presented at the British Machine Vision Conference, 2011.
    [21] C. Li and K. M. Kitani, "Pixel-level hand detection in ego-centric videos," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3570-3577.
    [22] Q. Dang, J. Yin, B. Wang, and W. Zheng, "Deep learning based 2D human pose estimation: A survey," Tsinghua Science and Technology, vol. 24, no. 6, pp. 663-676, 2019.
    [23] Y. Huang, X. Liu, X. Zhang, and L. Jin, "A pointing gesture based egocentric interaction system: Dataset, approach and application," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016, pp. 16-23.
    [24] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91-99.
    [25] X. Deng et al., "Joint Hand Detection and Rotation Estimation Using CNN," IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1888-1900, 2018.
    [26] T. H. N. Le, K. G. Quach, C. Zhu, C. N. Duong, K. Luu, and M. Savvides, "Robust hand detection and classification in vehicles and in the wild," in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 1203-1210: IEEE.
    [27] T. H. N. Le, C. Zhu, Y. Zheng, K. Luu, and M. Savvides, "Robust hand detection in vehicles," in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 573-578: IEEE.
    [28] C. Xu, W. Cai, Y. Li, J. Zhou, and L. Wei, "Accurate Hand Detection from Single-Color Images by Reconstructing Hand Appearances," Sensors, vol. 20, no. 1, p. 192, 2020.
    [29] R. Caruana, "Multitask learning," Machine learning, vol. 28, no. 1, pp. 41-75, 1997.
    [30] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, "Autoencoding beyond pixels using a learned similarity metric," in International conference on machine learning, 2016, pp. 1558-1566: PMLR.
    [31] D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013.
    [32] I. Goodfellow et al., "Generative adversarial nets," in Advances in neural information processing systems, 2014, pp. 2672-2680.
    [33] Q. Gao, J. Liu, and Z. Ju, "Robust real-time hand detection and localization for space human–robot interaction based on deep learning," Neurocomputing, vol. 390, pp. 198-206, 2019.
    [34] W. Liu et al., "Ssd: Single shot multibox detector," in European conference on computer vision, 2016, pp. 21-37: Springer.
    [35] L. Yang et al., "An embedded implementation of CNN-based hand detection and orientation estimation algorithm," Machine Vision Applications, vol. 30, no. 6, pp. 1071-1082, 2019.
    [36] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
    [37] Z. Tian, C. Shen, H. Chen, and T. He, "Fcos: Fully convolutional one-stage object detection," in Proceedings of the IEEE international conference on computer vision, 2019, pp. 9627-9636.
    [38] S. Ruder, "An Overview of Multi-Task Learning in Deep Neural Networks," ArXiv e-prints, vol. 1706, Accessed on: June 1, 2017Available: http://adsabs.harvard.edu/abs/2017arXiv170605098R
    [39] A. Bettge, R. Roscher, and S. Wenzel, "Deep Self-taught Learning for Remote Sensing Image Classification," CoRR, vol. abs/1710.07096, 2017.
    [40] Y. Ganin et al., "Domain-Adversarial Training of Neural Networks," Journal of Machine Learning Research, vol. 17, pp. 59:1-59:35, 2016.
    [41] M. Johnson et al., "Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation," ArXiv e-prints, vol. 1611, Accessed on: November 1, 2016Available: http://adsabs.harvard.edu/abs/2016arXiv161104558J
    [42] W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, "Self-taught clustering," in ICML, 2008.
    [43] P. Sinno Jialin and Y. Qiang, "A Survey on Transfer Learning," Knowledge and Data Engineering, IEEE Transactions on, vol. 22, no. 10, pp. 1345-1359, 2010.
    [44] J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, Dataset shift in machine learning. The MIT Press, 2009.
    [45] J. Gao, W. Fan, J. Jiang, and J. Han, "Knowledge transfer via multiple model local structure mapping," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 283-291.
    [46] A. Argyriou, T. Evgeniou, and M. Pontil, "Multi-task feature learning," Advances in neural information processing systems, vol. 19, pp. 41-48, 2006.
    [47] J. Davis and P. Domingos, "Deep transfer via second-order markov logic," in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 217-224.
    [48] W. Tao, M. Al-Amin, H. Chen, M. C. Leu, Z. Yin, and R. Qin, "Real-Time Assembly Operation Recognition with Fog Computing and Transfer Learning for Human-Centered Intelligent Manufacturing," Procedia Manufacturing, vol. 48, pp. 926-931, 2020/01/01/ 2020.
    [49] B. Chen, J. Wan, L. Shu, P. Li, M. Mukherjee, and B. Yin, "Smart factory of industry 4.0: Key technologies, application case, and challenges," IEEE Access, vol. 6, pp. 6505-6519, 2017.
    [50] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
    [51] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
    [52] S. Narasimhaswamy, Z. Wei, Y. Wang, J. Zhang, and M. Hoai, "Contextual attention for hand detection in the wild," in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9567-9576.
    [53] T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014, pp. 740-755: Springer.
    [54] L. Itti and C. Koch, "Computational modelling of visual attention," Nature reviews neuroscience, vol. 2, no. 3, pp. 194-203, 2001.
    [55] X. Qi, C. Liu, and S. Schuckers, "IoT edge device based key frame extraction for face in video recognition," in 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2018, pp. 641-644: IEEE.
    [56] S. Liu, H. Zhou, C. Li, and S. Wang, "Analysis of Anchor-Based and Anchor-Free Object Detection Methods Based on Deep Learning," in 2020 IEEE International Conference on Mechatronics and Automation (ICMA), 2020, pp. 1058-1065.
    [57] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
    [58] G. Cao, X. Xie, W. Yang, Q. Liao, G. Shi, and J. Wu, "Feature-fused SSD: fast detection for small objects," in Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 2018, vol. 10615, p. 106151E: International Society for Optics and Photonics.
    [59] X. Li et al., "Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection," arXiv preprint arXiv:.04388, 2020.
    [60] yqyao. (2019). FCOS_PLUS. Available: https://github.com/yqyao/FCOS_PLUS
    [61] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988.
    [62] J. Yu, Y. Jiang, Z. Wang, Z. Cao, and T. Huang, "Unitbox: An advanced object detection network," in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 516-520.
    [63] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
    [64] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [65] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
    [66] X. Zhang, X. Zhou, M. Lin, and J. Sun, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856.
    [67] H. Vanholder, "Efficient Inference with TensorRT," ed: ed, 2016.
    [68] N. Tajbakhsh et al., "Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?," IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1299-1312, 2016.
    [69] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. J. I. j. o. c. v. Zisserman, "The pascal visual object classes (voc) challenge," vol. 88, no. 2, pp. 303-338, 2010.
    [70] L. Bourdev and J. Malik, "Poselets: Body part detectors trained using 3d human pose annotations," in 2009 IEEE 12th International Conference on Computer Vision, 2009, pp. 1365-1372: IEEE.
    [71] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), 2005, vol. 1, pp. 886-893: IEEE.
    [72] M. J. Jones and J. M. Rehg, "Statistical Color Models with Application to Skin Detection," International Journal of Computer Vision, vol. 46, no. 1, pp. 81-96, 2002/01/01 2002.
    [73] V. Ferrari, M. Eichner, M. Marin-Jimenez, and A. Zisserman, "Buffy stickmen v3. 01 annotated data and evaluation routines for 2d human pose estimation," ed, 2013.
    [74] S. Bambach, S. Lee, D. J. Crandall, and C. Yu, "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1949-1957.
    [75] C. Lin. (2020). TX2-dnn-power-measurements (789030ed0642f1e1c11cdda6be4fc6dc09d591ec ed.). Available: https://github.com/lcm97/TX2-dnn-power-measurements
    [76] B. Alexe, T. Deselaers, and V. Ferrari, "Measuring the objectness of image windows," IEEE transactions on pattern analysis machine intelligence, vol. 34, no. 11, pp. 2189-2202, 2012.
    [77] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, no. 2, pp. 154-171, 2013.
    [78] J. Jung. (2021). tensorrt_demos. Available: https://github.com/jkjung-avt/tensorrt_demos

    無法下載圖示 全文公開日期 2026/01/25 (校內網路)
    全文公開日期 2026/01/25 (校外網路)
    全文公開日期 2026/01/25 (國家圖書館:臺灣博碩士論文系統)
    QR CODE