基於卷積神經網路的浮空手寫辨識預處理優化研究｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	李家瑜 Chia-Yu Lee
論文名稱：	基於卷積神經網路的浮空手寫辨識預處理優化研究 A study of optimized data pre-processing for CNN-based air-writing recognition
指導教授：	孫沛立 Pei-Li Sun
口試委員:	林宗翰 Tzung-Han Lin 陳怡永 Yi-Yung Chen 胡國瑞 Kuo-Jui Hu
學位類別：	碩士 Master
系所名稱：	應用科技學院 - 色彩與照明科技研究所 Graduate Institute of Color and Illumination Technology
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	79
中文關鍵詞：	浮空手寫辨識、手勢辨識、影像處理、深度學習、卷積神經網路、深度相機
外文關鍵詞：	air-writing recognition, gesture recognition, image processing, deep learning, CNN, depth camera
相關次數：	點閱：233 下載：13
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

浮空手寫辨識是人機互動研究領域的熱門議題，該技術可用於擴增實境 (Augmented Reality)或虛擬實境(Virtual Reality)，是最自然的人機互動方式之一。近年來，由於硬體設備和演算法的進步，結合影像處理與深度學習的影像辨識技術表現優異。在取像設備方面，除了使用一般攝影機之外，小型便攜的深度相機也越來越普及。
本研究利用光達技術搜集深度資訊，配合彩色影像以及手部追蹤模組 (MediaPipe Hands)搜集手部特徵位置，並研究透過影像預處理的方式增強指尖手勢軌跡特徵。本研究提出影像預處理方式以及雙重卷積神經網路，應用在本研究搜集的 43 種類別英文大小寫字母軌跡上，整體辨識準確率從 67%提升至 81%。將本研究提出的資料預處理方式套用在現有的手勢辨識模型中，也能讓辨識準確率從四至五成左右顯著的提升至七成以上。
本研究在浮空手寫中主要達到三種目標: 一、提出的指尖手寫軌跡資料預處理方式利用時序資料拆分、膨脹與高斯模
糊影像處理等，有效地增強軌跡特徵。該預處理方法運算量小，可以應用在其他運動的軌跡辨識類別上。
二、提出的雙重卷積模型由兩種不同卷積方式的神經網路組成，在浮空手寫辨識上顯著的提升辨識效果，未來可以套用在更複雜的浮空手寫軌跡辨識上。
三、利用光達相機的深度資訊與 MediaPipe Hands，改善了現有浮空手寫辨識系統中複雜且不夠直接的操作方式。

Air-writing recognition is a hot topic in the field of human-computer interaction (HCI) research. This technology can be used in augmented reality (AR) or virtual reality (VR), and is one of the most natural human-computer interaction methods. In recent years, due to the advancement of hardware equipment and algorithms, image recognition technology that combines image processing and deep learning has performed well. In terms of HCI devices, in addition to the commonly used web cameras, depth cameras are becoming more and more popular. In this study, we used LiDAR-type depth camera to collect RGB image and depth information, and used MediaPipe Hands software to detect the trajectory curves of fingertip. We proposed an image preprocessing method to extract the feature of air-writing trajectory for recognition of English characters from dual-CNNs. The results show that the accuracy increased from 67% to 81%. The data preprocessing method proposed in this study is applied to the existing gesture recognition model, and the recognition accuracy can be significantly improved from about 50% to more than 70%.
This research mainly achieves three goals in air-writing recognition as following:
First, the proposed preprocessing method of fingertip trajectory data uses time splitting, dilation and Gaussian blurring to enhance trajectory features. This preprocessing method has a small amount of computation cost and can be applied to other motion trajectory recognition models.
Second, the proposed dual-CNNs model is composed of two neural networks with different convolution methods, which can significantly improve the recognition accuracy in gesture recognition, and can be applied to more complex gesture trajectory recognition in the future.
Third, using the depth information of the LiDAR camera and MediaPipe Hands, the complicated process in the existing air-writing recognition system is simplified.

論文摘要    II
ABSTRACT    III
誌謝    V
目錄    VI
圖目錄    VIII
表目錄    XII
第1章 緒論    1
1.1 研究背景    1
1.2 研究動機與目的    2
1.3 論文架構    3
第2章 文獻探討    4
2.1 影像辨識    4
2.2 基於機器學習影像辨識技術    5
2.2.1 機器學習概述    5
2.2.2 深度學習    7
2.2.3 深度學習影像辨識模型    11
2.3 人機互動    18
第3章 研究方法    27
3.1 硬體設備與環境    27
3.2 字元資料收集    30
3.2.1 指尖偵測    30
3.2.2 字元軌跡收集    31
3.3 軌跡預處理    33
3.3.1 資料正規化    33
3.3.2 擴增軌跡特徵    34
3.3.3 時序資料拆分    35
3.4 實驗資料收集    37
3.5 實驗設計    38
第4章 實驗一：影像預處理實驗    39
4.1 神經網路架構與訓練參數    39
4.2 深度資訊實驗    41
4.3 影像處理實驗    42
4.4 時序解析實驗    49
4.5 小結    53
第5章 實驗二：神經網路架構實驗    57
5.1 神經網路架構與訓練參數    57
5.2 訓練資料    58
5.3 實驗結果    59
5.4 與現有模型比較    61
5.5 RTC資料集驗證    63
5.6 小結    65
第6章 系統實作    66
6.1 系統環境與流程    66
6.2 系統展示    68
第7章 結論與建議    69
7.1 結論    69
7.2 建議    70
參考文獻    71
附錄A：軌跡資料拆分實驗數據    77
                                

[1] Z. Ye, X. Zhang, L. Jin, Z. Feng, and S. Xu, “Finger-Writing-In-The-Air System Using Kinect Sensor”, School of Electronic and Information Engineering, South China University of Technology, China,” no. 2, pp. 2–5, 2011.
[2] T. Te Chu and C. Y. Su, “A Kinect-based handwritten digit recognition for TV remote controller,” ISPACS 2012 - IEEE Int. Symp. Intell. Signal Process. Commun. Syst., no. Ispacs, pp. 414–419, 2012, doi: 10.1109/ISPACS.2012.6473522.
[3] D. S. Tran, N. H. Ho, H. J. Yang, E. T. Baek, S. H. Kim, and G. Lee, “Real- time hand gesture spotting and recognition using RGB-D Camera and 3D convolutional neural network,” Appl. Sci., vol. 10, no. 2, 2020, doi: 10.3390/app10020722.
[4] S. Mukherjee, S. A. Ahmed, D. P. Dogra, S. Kar, and P. P. Roy, “Fingertip detection and tracking for recognition of air-writing in videos,” Expert Syst. Appl., vol. 136, pp. 217–229, 2019, doi: 10.1016/j.eswa.2019.06.034.
[5] E. G. Del Rosario, C. P. Nadora, R. L. G. Trinidad, M. J. C. Samonte, M. Devaraj, and J. C. De Goma, “Processing RGB-D Data from a 3D Camera using Object Detection and Written Character Recognition in Convolutional Neural Networks for Virtual Finger Writing,” ACM Int. Conf. Proceeding Ser., pp. 36–40, 2020, doi: 10.1145/3383812.3383826.
[6] C. H. Hsieh, Y. S. Lo, J. Y. Chen, and S. K. Tang, “Air-Writing Recognition Based on Deep Convolutional Neural Networks,” IEEE Access, vol. 9, pp. 142827–142836, 2021, doi: 10.1109/ACCESS.2021.3121093.
[7] M. C. C. Escopete, C. G. Laluon, E. M. Llarenas, P. M. Reyes, and R. E. Tolentino, “Recognition of English Capital Alphabet in Air writing Using Convolutional Neural Network and Intel RealSense D435 Depth Camera,” 2021 2nd Glob. Conf. Adv. Technol. GCAT 2021, pp. 1–8, 2021, doi: 10.1109/GCAT52182.2021.9587515.
[8] F. Zhang et al., “MediaPipe Hands: On-device Real-time Hand Tracking”, arXiv preprint arXiv:2006. 10214, 2020.
[9] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, pp. 886-893 vol. 1, doi: 10.1109/CVPR.2005.177.
[10] T. P. Salunke and S. D. Bharkad, “Power point control using hand gesture recognition based on hog feature extraction and k-nn classification,” 2017 International Conference on Computing Methodologies and Communication (ICCMC), 2017, pp. 1151-1155, doi: 10.1109/ICCMC.2017.8282654.
[11] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt and B. Scholkopf, “Support vector machines,” in IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18-28, July-Aug. 1998, doi: 10.1109/5254.708428.
[12] H. A. Park, “An introduction to logistic regression: From basic concepts to interpretation with particular attention to nursing domain,” J. Korean Acad. Nurs., vol. 43, no. 2, pp. 154–164, 2013, doi: 10.4040/jkan.2013.43.2.154.
[13] A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network,” Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020.
[14] IBM Cloud 學習中心, “什麼是神經網路?” ,2020, Retrieved from https://www.ibm.com/tw-zh/cloud/learn/neural-networks.
[15] M. J. Brown, L. A. Hutchinson, M. J. Rainbow, K. J. Deluzio, and A. R. De Asha, “A comparison of self-selected walking speeds and walking speed variability when data are collected during repeated discrete trials and during continuous walking,” Journal of applied biomechanics, vol. 33, no. 5, pp. 384– 387, 2017.
[16] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel- softmax,” arXiv preprint arXiv:1611.01144, 2016.
[17] Waymo, Tech Lead, “聊一聊深度學習的activation function”, 2017, Retrieved from https://zhuanlan.zhihu.com/p/25110450.
[18] K. Fukushima and S. Miyake, “Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition,” in Competition and cooperation in neural nets, Springer, 1982, pp. 267–285.
[19] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in international conference on machine learning, 2015, pp. 448–456.
[20] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
[21] Y. Lecun, C. Cortes, and C. J. Burges. “MNIST handwritten digit database”. In: ATT Labs. Available: http://yann.lecun.com/exdb/mnist, 2010 .
[22] T. F. Gonzalez, “Handbook of approximation algorithms and metaheuristics,” Handb. Approx. Algorithms Metaheuristics, pp. 1–1432, 2007, doi: 10.1201/9781420010749.
[23] Y. Takamitsu and Y. Orita, “Effect of glomerular change on the electrolyte reabsorption of the renal tubule in glomerulonephritis (author’s transl),” Nihon Jinzo Gakkai shi, vol. 20, no. 11, pp. 1221–1227, 1978.
[24] D. M. Hawkins, “The problem of overfitting,” Journal of chemical information and computer sciences, vol. 44, no. 1, pp. 1–12, 2004.
[25] 機器學習演算法與Python學習, “圖解10大CNN網路架構，通俗易懂!”, 2022, Retrieved from https://sa123.cc/nss81dtqektf52t9wce3.html.
[26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[27] Lyon, “CNN深度卷積神經網絡-VGG” ,2020 ,Retrieved from https://zhuanlan.zhihu.com/p/107884876.
[28] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2012.
[29] T. Huang, “深度學習: 物件偵測上的模型結構變化” ,2019 , Retrieved from https://chih-sheng-huang821.medium.com/深度學習-物件偵測上的模型結構變化-e23fd928ee59.
[30] R. Girshick, J. Donahue, T. Darrell, J. Malik, U. C. Berkeley, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1, p. 5000, 2014, doi: 10.1109/CVPR.2014.81.
[31] R. Girshick, “Fast R-CNN,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 1440–1448, 2015, doi: 10.1109/ICCV.2015.169.
[32] H. Rampersad, “Developing,” Total Perform. Scorec., pp. 159–183, 2020, doi: 10.4324/9780080519340-12.
[33] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
[34] W. Liu et al., “SSD: Single shot multibox detector,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp. 21–37, 2016, doi: 10.1007/978-3-319-46448-0_2.
[35] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[36] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517–6525, 2017, doi: 10.1109/CVPR.2017.690.
[37] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” 2018, Available: http://arxiv.org/abs/1804.02767.
[38] P. Adarsh, P. Rathi, and M. Kumar, “YOLO v3-Tiny: Object Detection and Recognition using one stage improved model,” 2020 6th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2020, no. August, pp. 687–694, 2020, doi: 10.1109/ICACCS48705.2020.9074315.
[39] Google, “MediaPipe Hands”, 2020, Retrieved from https://google.github.io/mediapipe/solutions/hands.html.
[40] R. Pradhan, S. Kumar, R. Agarwal, M. P. Pradhan, and M. K. Ghose, “Contour Line Tracing Algorithm for Digital Topographic Maps of the landforms or rivers. These contour lines are also used for generating imagery or aerial photographs Z. This paper suggests an algorithm that can be topographical sheets and creating a data,” Int. J. Image Process., vol. 4, no. 2, pp. 156–163, 2010.
[41] T. H. Sun, “K-cosine comer detection,” J. Comput., vol. 3, no. 7, pp. 16–22, 2008.
[42] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.
[43] S. Bambach, S. Lee, D. J. Crandall, and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 1949–1957, 2015, doi: 10.1109/ICCV.2015.226.
[44] A. Mittal, A. Zisserman, and P. Torr, “Hand detection using multiple proposals,” pp. 75.1-75.11, 2011, doi: 10.5244/c.25.75.
[45] Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-learning-detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp. 1409–1422, 2012, doi: 10.1109/TPAMI.2011.239.
[46] G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “EMNIST: Extending MNIST to handwritten letters,” Proc. Int. Jt. Conf. Neural Networks, vol. 2017-May, pp. 2921–2926, 2017, doi: 10.1109/IJCNN.2017.7966217.
[47] G. R. Bradski, “Computer vision face tracking for use in a perceptual user interface,” 1998.
[48] N. L. Hakim, T. K. Shih, S. P. Kasthuri Arachchi, W. Aditya, Y.-C. Chen, and C.-Y. Lin, “Dynamic hand gesture recognition using 3DCNN and LSTM with FSM context-aware model,” Sensors, vol. 19, no. 24, p. 5429, 2019.
[49] Intel, “Intel ® RealSense TM LiDAR Camera L515 Datasheet,” no. January, 2021, Available: www.intel.com/design/literature.htm.
[50] R. M. Haralick and S. R. Sternberg, “Image Analysis Morphology,” Analysis, no. 4, pp. 532–550, 1987.
[51] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[52] M. Shahinur, “RTC Dataset”, 2019, Retrieved from https://shahinur.com/en/rtc/, doi: 10.1109/CRC.2019.00026.

簡易檢索 / 詳目顯示

相關論文