簡易檢索 / 詳目顯示

研究生: 蔡明樺
Ming-Hua Tsai
論文名稱: 應用動態時間校正結合啟發式演算法解決生產線動作影像貼標問題
Application DTW method with combine meta-heuristic to solve motion labeling problem in production line
指導教授: 楊朝龍
Chao-Lung Yang
口試委員: 歐陽超
Chao Ou-Yang
黃奎隆
Kwei-Long Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 51
中文關鍵詞: 自動貼標動作辨識啟發式演算法
外文關鍵詞: Auto-labeling, Motion recognition, Meta-heuristic
相關次數: 點閱:434下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

透過深度神經網路進行影像動作辨識(Human motion or action recognition, HAR),為近幾年於電腦視覺領域相當重要之課題。其具備可預先於具高算力裝置完成訓練,並將訓練成果交由終端裝置使用,使其相關應用被廣泛用於生活當中。在機器學習框架下,影像動作辨識模型於訓練過程極其依賴樣本資料及貼標資訊。而為圖像或影像資料進行標註是相當昂貴的,同時現實中存在大量的未貼標資料,僅有少數資料已完成標註。為了提高影像貼標工作的效率,本研究提出一種為應用於工廠生產線,基於骨架(Skeleton-based)影像辨識技術開發之自動貼標方法。本研究將以最常見的攝影機蒐集影像資料,透過骨架偵測模型將影像各幀(Frame)畫面轉為人體骨架各關節點位時間序列(Time series)資料。在本研究中,將以標準流程動作(Standard of process, SOP)進行操作的已貼標動作影像定義為參考資料。透過計算參考資料與未貼標資料間動態時間校正(Dynamic time warping, DTW)距離,將貼標問題定義為降低DTW距離之優化問題。由於該優化問題為NP-Hard,因此將使用基因演算法(Genetic algorithm, GA)及粒子群最佳化(Particle swarm optimization, PSO)等啟發式演算法,對未貼標時間序列資料分段點進行優化。於優化過程中,透過設計早停法(Early stopping)跳出機制之參數與控制總群規模,以改善分段結果。將共67部影像分段點與人工貼標結果進行驗證,其準確度最高可來到97%,且與人工標註相比本研究之方法顯得更加有效率。


In recent years, human motion or action recognition (HAR) through deep neural network has been a very important topic in the computer vision field, because it may be trained in advance, then be implemented on the edge computing devices for further applications. As a machine learning framework, the training process of HAR is extremely dependent on data samples and labeling information. However, labelling the image or video data is very costly. Therefore, there are a lot of the unlabeled data and few labelled data. In order to enhance the efficiency of video labeling, this research proposed an auto-labeling method particularly for the application of applying skeleton-based action recognition on factory production lines. This research utilized the most common cameras to collect video data. Then, the skeleton detection model was used to convert each frame of the video into time series data of each joint point of the human skeletons. In this work, the label information of human action video under the standard of process (SOP) was defined as the reference time series data. By calculating dynamic time warping (DTW) distance between the reference video sequence and un-labeled sequence, the optimization problem of shortening the DTW distance can be defined. Because the optimization problem is NP-hard, the meta-heuristic algorithms such as genetic algorithm (GA) and particle swarm optimization (PSO) were applied to optimize the segmentation of the un-labeled time series data. The work also proposed an early stopping mechanism and set up the population size to improve the segmentation results. A total of 67 video segmentation points was verified by manual labeling results. Based on the experimental result, the labeling accuracy can reach up to 97% with relatively fast efficiency comparing with manual labeling.

摘要 i ABSTRACT ii 目錄 v 附圖目錄 vii 附表目錄 viii 第1章. 緒論 1 1.1 樣本標籤於深度神經網路模型中重要性 1 1.2 標註樣本資料之難處 1 1.3 自動貼標系統 2 1.4 研究方向 2 1.4.1 自動貼標架構降低人員誤差 3 1.4.2 透過啟發式演算法提升貼標系統效能 3 1.5 研究架構 4 第2章. 文章探討 6 2.1 影像動作辨識 6 2.2 人工貼標問題 6 2.3 動態時間校正 8 2.4 動作影像分割演算法 11 第3章. 研究方法 14 3.1 實驗架構 14 3.2 資料蒐集與前處理 15 3.3 啟發式演算法效率實驗 19 3.4 自動貼標準確率實驗 22 第4章. 實驗和結果 25 4.1 實驗說明 25 4.2 啟發式演算法效率實驗結果 28 4.3 自動貼標準確率實驗結果 30 4.4 實驗結果分析 34 第5章. 結論與未來展望 36 5.1 結論 36 5.2 未來展望 37 參考資料 39

[1] Z. Cao, T. Simon, S. Wei, and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1302-1310.
[2] S. Yan, Y. Xiong, and D. Lin, "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 04/27 2018.
[3] S. Huang, R. Jin, and Z. Zhou, "Active Learning by Querying Informative and Representative Examples," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 10, pp. 1936-1949, 2014.
[4] Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway, and J. Liang, "Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4761-4772.
[5] M. Stikic, K. V. Laerhoven, and B. Schiele, "Exploring semi-supervised and active learning for activity recognition," in 2008 12th IEEE International Symposium on Wearable Computers, 2008, pp. 81-88.
[6] D.-H. Lee, "Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks," ICML 2013 Workshop : Challenges in Representation Learning (WREPL), 07/10 2013.
[7] A. Tarvainen and H. Valpola, "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017.
[8] D. Wang and Y. Shang, "A new active labeling method for deep learning," in 2014 International Joint Conference on Neural Networks (IJCNN), 2014, pp. 112-119.
[9] S. Stan and C. Philip, "Toward accurate dynamic time warping in linear time and space," Intelligent Data Analysis, vol. 11, pp. 561-580, 2007.
[10] P. J. Angeline, "Evolutionary Optimization Versus Particle Swarm Optimization: Philosophy and Performance Differences," presented at the Proceedings of the 7th International Conference on Evolutionary Programming VII, 1998.
[11] J. E. van Engelen and H. H. Hoos, "A survey on semi-supervised learning," Machine Learning, vol. 109, no. 2, pp. 373-440, 2020/02/01 2020.
[12] Y. Chen, B. Hu, E. Keogh, and G. E. A. P. A. Batista, "DTW-D: time series semi-supervised learning from a single example," presented at the Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, Illinois, USA, 2013. Available: https://doi.org/10.1145/2487575.2487633
[13] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43-49, 1978.
[14] S. Sempena, M. Nur Ulfa, and A. Peb Ruswono, "Human action recognition using Dynamic Time Warping," in Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, 2011, pp. 1-5.
[15] S. Celebi, A. S. Aydin, T. T. Temiz, and T. Arici, "Gesture recognition using skeleton data with weighted dynamic time warping," VISAPP 2013 - Proceedings of the International Conference on Computer Vision Theory and Applications, vol. 1, pp. 620-625, 01/01 2013.
[16] A. Switonski, H. Josinski, and K. Wojciechowski, "Dynamic time warping in classification and selection of motion capture data," Multidimensional Systems and Signal Processing, vol. 30, no. 3, pp. 1437-1468, 2019/07/01 2019.
[17] S. Jianbo and J. Malik, "Motion segmentation and tracking using normalized cuts," in Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), 1998, pp. 1154-1160.
[18] K. Papoutsakis, C. Panagiotakis, and A. A. Argyros, "Temporal action co-segmentation in 3d motion capture data and videos," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6827-6836.
[19] E. Terzi and P. Tsaparas, "Efficient Algorithms for Sequence Segmentation," in SDM, 2006.
[20] J. Kleinberg, C. Papadimitriou, and P. Raghavan, "Segmentation problems," J. ACM, vol. 51, no. 2, pp. 263–280, 2004.
[21] M. Beenstock and G. Szpiro, "Specification search in nonlinear time-series models using the genetic algorithm," Journal of Economic Dynamics and Control, vol. 26, no. 5, pp. 811-835, 2002/05/01/ 2002.
[22] V. S. Tseng, C.-H. Chen, P.-C. Huang, and T.-P. Hong, "Cluster-based genetic segmentation of time series with DWT," Pattern Recognition Letters, vol. 30, no. 13, pp. 1190-1197, 2009/10/01/ 2009.
[23] G. C. Lui, D. Wu, K. W. Cheung, H. F. Ma, and K. Y. Szeto, "Time warping of apneic ECG signals using genetic algorithm," in 2016 IEEE Congress on Evolutionary Computation (CEC), 2016, pp. 178-184.

無法下載圖示 全文公開日期 2024/09/10 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE