研究生: |
陳俊億 Jyun-Yi Chen |
---|---|
論文名稱: |
基於深度學習之3D蚱蜢運動研究 Study of 3D grasshopper motion with deep learning |
指導教授: |
蘇順豐
Shun-Feng Su 郭重顯 Chung-Hsien Kuo |
口試委員: |
林惠勇
Huei-Yung Lin 鍾聖倫 Sheng-Luen Chung 蘇順豐 Shun-Feng Su 林峻永 Chun-Yeon Lin 郭重顯 Chung-Hsien Kuo |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 73 |
中文關鍵詞: | 仿生物機器人 、高速相機 、影像處理 、影像前景/背景分離 、深度學習 、電腦視覺 、多視角立體視覺 |
外文關鍵詞: | Biorobotics, High-Speed Camera, Image Processing, Image Matting, Deep Learning, Computer Vision, MVS (Multi-View Stereo) |
相關次數: | 點閱:224 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
仿生物機器人一直以來都是熱門的研究項目之一, 但是在設計機構之前,我們必須對於此生物有所了解,包含生物如何行走、如何跳躍以及身體結構,國外多為辨識蚱蜢或是高速攝影蚱蜢居多,較少有多視角的蚱蜢資料集,且如果要細部拍攝蚱蜢的行為,會對於圖片的呈現與高偵數的動作會有極大的要求,也是本篇的重點之一。文本提出一個透過高速相機收集的多視角台灣蚱蜢資料集,包含有蚱蜢的行走、蚱蜢(螽斯)的跳躍兩大部分,並透過三台高速相機捕捉蚱蜢運動時的軌跡。在行走的資料集中,使用60FPS拍攝蚱蜢,大約捕捉了每一台相機30秒連續行走的畫面,而在跳躍的資料集中,我們採用200FPS拍攝螽斯,由於跳躍時間很短暫,捕捉了每一台相機20到40張跳躍的畫面。收集到的資料集,我們會先進行影像前處理,透過Image Matting將蚱蜢萃取出來,並透過Image Processing處理雜訊與補償特徵點,使用DeepLabCut訓練處理過的蚱蜢資料集預測出身體各部位,以及使用3D reconstruction的方式重建出蚱蜢3D骨架圖,就可以針對3D的模型進行運動姿態分析、各部位角度分析,以提供之後蚱蜢仿生物機器人之機構驗證與機構模擬分析。最後比較雙目立體視覺上與三目立體視覺之間的差別,在行走預測上相差為1.21 pixel,而跳躍預測中相差1.37 pixel,在準度上雙目較於三目準確,但就行為辨識上三目較優於雙目。
Biomimetic robot creation has always been a popular research endeavor, but before designing a robot, the researcher needs to understand the actual creature, including how it walks, how it jumps and its body structure. The majority of foreign grasshopper images are identification grasshoppers or high-speed photography grasshoppers. There are few multiangle grasshopper datasets. If we want to photograph grasshopper behavior in detail, presenting the pictures and high-detection movements will be very demanding. This lack of data is also one of the main points of this article. In this paper, we propose a multiangle grasshopper dataset from Taiwan collected by high-speed cameras, which includes two major parts: grasshopper walking and katydid jumping. Three high-speed cameras were used to capture the trajectories of grasshoppers during their movements. In the walking dataset, the grasshoppers were captured at 60 FPS, with each camera capturing approximately 30 seconds of walking images. In the jumping dataset, the grasshoppers were captured at 200 FPS, and due to the short jumping time, each camera captured 20 to 40 jumping images. The collected dataset is first processed by image matching to extract the grasshoppers, then by image processing to process the noise and to compensate the feature points, and by DeepLabCut [1] to train the processed grasshopper dataset to predict the parts of the body. The MVS (Multi-View Stereo) [2] is used to reconstruct the 3D skeleton of the grasshopper, and then a 3D model can be analyzed for motion posture and angular analysis of each part to provide a future structural verification and mechanism simulation analysis of a grasshopper bionic robot.
[1] A. Mathis, P. Mamidanna, K. M. Cury, T. Abe, V. N. Murthy, M. W. Mathis, and M. Bethge, "DeepLabCut: markerless pose estimation of user-defined body parts with deep learning," Nature neuroscience, vol. 21, no. 9, pp. 1281-1289, 2018.
[2] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
[3] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, "You only learn one representation: Unified network for multiple tasks," arXiv preprint arXiv:2105.04206, 2021.
[4] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," ArXiv, vol. abs/2004.10934, 2020.
[5] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[6] A. Kirillov, Y. Wu, K. He, and R. Girshick, "Pointrend: Image segmentation as rendering," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9799-9808.
[7] N. Xu, B. Price, S. Cohen, and T. Huang, "Deep image matting," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2970-2979.
[8] S. Lin, A. Ryabtsev, S. Sengupta, B. L. Curless, S. M. Seitz, and I. Kemelmacher-Shlizerman, "Real-time high-resolution background matting," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8762-8771.
[9] S. Lin, L. Yang, I. Saleemi, and S. Sengupta, "Robust high-resolution video matting with temporal guidance," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 238-247.
[10] A. Konez, A. Erden, and M. Akkök, "Preliminary design analysis of like-grasshopper jumping mechanism," in The 12th International Conference on Machine Design and Production, Turkey, 2006.
[11] D. Hawlena, H. Kress, E. R. Dufresne, and O. J. Schmitz, "Grasshoppers alter jumping biomechanics to enhance escape performance under chronic risk of spider predation," Functional Ecology, vol. 25, no. 1, pp. 279-288, 2011.
[12] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291-7299.
[13] J. Lauer, M. Zhou, S. Ye, W. Menegas, S. Schneider, T. Nath, M. M. Rahman, V. Di Santo, D. Soberanes, and G. Feng, "Multi-animal pose estimation, identification and tracking with DeepLabCut," Nature Methods, vol. 19, no. 4, pp. 496-504, 2022.
[14] C. Li and G. H. Lee, "From synthetic to real: Unsupervised domain adaptation for animal pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1482-1491.
[15] T. Jiang, H. Cui, and X. Cheng, "Accurate calibration for large-scale tracking-based visual measurement system," IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1-11, 2020.
[16] Y. Wei and Y. Xi, "Optimization of 3-D Pose Measurement Method Based on Binocular Vision," IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-12, 2022.
[17] M. Jiang, R. Sogabe, K. Shimasaki, S. Hu, T. Senoo, and I. Ishii, "500-fps omnidirectional visual tracking using three-axis active vision system," IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1-11, 2021.
[18] X. Wu, C. Zhan, Y.-K. Lai, M.-M. Cheng, and J. Yang, "Ip102: A large-scale benchmark dataset for insect pest recognition," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8787-8796.
[19] L. B. Symes, S. J. Martinson, L.-O. Hoeger, R. A. Page, and H. M. Ter Hofstede, "From understory to canopy: In situ behavior of Neotropical forest katydids in response to bat echolocation calls," Frontiers in Ecology and Evolution, vol. 6, p. 227, 2018.
[20] C. Szegedy, A. Toshev, and D. Erhan, "Deep neural networks for object detection," Advances in neural information processing systems, vol. 26, 2013.
[21] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
[22] A. Mathis, S. Schneider, J. Lauer, and M. W. Mathis, "A primer on motion capture with deep learning: principles, pitfalls, and perspectives," Neuron, vol. 108, no. 1, pp. 44-65, 2020.
[23] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[24] B. Q. Geuther, S. P. Deats, K. J. Fox, S. A. Murray, R. E. Braun, J. K. White, E. J. Chesler, C. M. Lutz, and V. Kumar, "Robust mouse tracking in complex environments using neural networks," Communications biology, vol. 2, no. 1, pp. 1-11, 2019.
[25] S. Raman, R. Maskeliūnas, and R. Damaševičius, "Markerless dog pose recognition in the wild using ResNet deep learning model," Computers, vol. 11, no. 1, p. 2, 2021.
[26] B. Li, L. Heng, K. Koser, and M. Pollefeys, "A multiple-camera system calibration toolbox using a feature descriptor-based calibration pattern," in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013: IEEE, pp. 1301-1307.
[27] X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, and X. Zhang, "On building an accurate stereo matching system on graphics hardware," in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011: IEEE, pp. 467-474.
[28] T. Nath, A. Mathis, A. C. Chen, A. Patel, M. Bethge, and M. W. Mathis, "Using DeepLabCut for 3D markerless pose estimation across species and behaviors," Nature protocols, vol. 14, no. 7, pp. 2152-2176, 2019.
[29] P. Karashchuk, K. L. Rupp, E. S. Dickinson, S. Walling-Bell, E. Sanders, E. Azim, B. W. Brunton, and J. C. Tuthill, "Anipose: a toolkit for robust markerless 3D pose estimation," Cell reports, vol. 36, no. 13, p. 109730, 2021.
[30] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International conference on machine learning, 2019: PMLR, pp. 6105-6114.
[31] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.