研究生: |
陳品匡 Pin-Kuang Chen |
---|---|
論文名稱: |
以深度攝影機影像合成物品點雲模型並生成物件辨識資料集 Using a depth camera to synthesize object point clouds and generate object detection datasets |
指導教授: |
李維楨
Wei-Chen Lee |
口試委員: |
林宗翰
Tzung-Han Lin 徐繼聖 Gee-Sern Hsu |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 74 |
中文關鍵詞: | 電腦視覺 、影像處理 、影像合成 、3D點雲 、ICP 、深度學習 、物件辨識 、Faster R-CNN |
外文關鍵詞: | Computer Vision, Image Processing, Image Synthesis, 3D point cloud, ICP, Deep Learning, Object detection, Faster R-CNN |
相關次數: | 點閱:398 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,以深度學習進行目標檢測模型的開發已成為一項成熟且熱門的技術,大量帶有標註的公開影像資料集涵蓋了日常生活中的許多物品。然而,當我們欲辨識的目標無法在公開資料集中取得時,手動拍攝並標註影像中的目標物是在訓練辨識模型前耗費大量時間與資源的步驟。
本研究之目的在使用合成影像擴增少量的真實影像訓練集,並自動化標註合成影像。使用深度攝影機搭配旋轉平台得到物品的多視角RGB點雲,以color support ICP將多視角點雲拼疊合成物品點雲,並使物品點雲以多種角度的姿態轉換至影像平面,以得到多樣化的物品合成影像作為訓練集。在合成影像與場景融合時,即可得到目標在場景中的位置,自動化標註改善了人力所耗費的成本。
本研究使用4種商品影像作為辨識的目標,並給定相同的辨識場景。實驗結果顯示,當我們將300張的合成影像與100張的真實影像混合時,在IoU門檻值為[0.5:0.1:0.9]的條件下,mAP平均得到0.47,而使用400張的真實影像時,在相同IoU的條件下mAP平均得到0.29,加入多種姿態的合成影像確實有助於提升辨識模型的平均精度。而以100張真實影像為基底,不斷地擴增合成影像作為訓練集,在加入700張的合成影像時,在IoU門檻值為[0.5:0.1:0.9]的條件下mAP平均得到0.1529,模型表現差於使用400張真實影像的辨識模型。品質較差的影像對於模型精度的提升仍是有效的,但當大量真實性不佳的合成資料加入資料集,會使模型過度學習錯誤的資訊,對於辨識模型的表現反而有負面影響。
In recent years, the development of object detection and deep learning has become a mature and accessible technology. Many annotated object datasets are open to everyone. However, when the object cannot be found in the datasets, manually taking the photo and annotating the object in the images is a step that consumes a lot of time and resources before training the object detection model.
The objective of this research was to use synthetic images to augment a small number of real images as the training sets and automatically annotate synthetic images. We used a depth camera and a rotating platform to capture multi-view RGB point clouds of an object. Then we adopted the color support ICP algorithm to achieve multi-viewpoint cloud registration and converted the point clouds to 2D images at various angles to fuse them with the real scene to generate the images for training.
In this research, we used four different commodities as recognition targets. The experimental results show that when we used 300 synthetic images and 100 real images for training, we can obtain mAP of 0.47 under the condition of the IoU threshold value [0.5:0.1:0.9]. When using 400 real images as training sets, under the same IoU condition, the average mAP was 0.29. The synthetic images with multiple poses significantly improve the average accuracy of the recognition model. We continued to add synthetic images to our training set. When 700 synthetic images were added, the average mAP was 0.1529 under the condition of the IoU threshold value [0.5:0.1:0.9]. This performance was worse than our previous results because a large amount of synthetic data may have a negative impact on the performance of the model.
[1] D. Dwibedi, I. Misra, and M. Hebert, "Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection," in 2017 IEEE International Conference on Computer Vision (ICCV), 22-29 Oct. 2017 2017, pp. 1310-1319, doi: 10.1109/ICCV.2017.146.
[2] P. Rajpura, R. Hegde, and H. Bojinov, "Object Detection Using Deep CNNs Trained on Synthetic Images," 06/21 2017.
[3] G. Zhang, W. Huang, Y. Shen, and X. Wang, "Fast and accurate 3D reconstruction using a moving depth camera and inertial sensor," in 2017 Chinese Automation Congress (CAC), 20-22 Oct. 2017 2017, pp. 1255-1260, doi: 10.1109/CAC.2017.8242959.
[4] S. Rusinkiewicz and M. Levoy, "Efficient variants of the ICP algorithm," ed. Los Alamitos, CA, USA, USA: IEEE, 2001, pp. 145-152.
[5] H. Chen, Y. Feng, J. Yang, and C. Cui, "3D Reconstruction Approach for Outdoor Scene Based on Multiple Point Cloud Fusion," Journal of the Indian Society of Remote Sensing, Original Paper vol. 47, no. 10, p. 1761, 2019, doi: 10.1007/s12524-019-01029-y.
[6] R. A. Newcombe et al., "KinectFusion: Real-time dense surface mapping and tracking," in 2011 10th IEEE International Symposium on Mixed and Augmented Reality, 26-29 Oct. 2011 2011, pp. 127-136, doi: 10.1109/ISMAR.2011.6092378.
[7] J. Yang, H. Li, and Y. Jia, "Go-ICP: Solving 3D Registration Efficiently and Globally Optimally," in 2013 IEEE International Conference on Computer Vision, 1-8 Dec. 2013 2013, pp. 1457-1464, doi: 10.1109/ICCV.2013.184.
[8] Y. He, B. Liang, J. Yang, S. Li, and J. He, "An Iterative Closest Points Algorithm for Registration of 3D Laser Scanner Point Clouds with Geometric Features," Sensors, vol. 17, no. 8, 2017, doi: 10.3390/s17081862.
[9] J. Demantké, C. Mallet, N. David, and B. Vallet, "Dimensionality based scale selection in 3D lidar point clouds," Proceedings of the ISPRS Workshop Laser Scanning, vol. 38, pp. 97-102, 01/01 2011, doi: 10.5194/isprsarchives-XXXVIII-5-W12-97-2011.
[10] A. Segal, D. Hähnel, and S. Thrun, Generalized-ICP. 2009.
[11] M. Korn, M. Holzkothen, and J. Pauli, "Color supported generalized-ICP," in 2014 International Conference on Computer Vision Theory and Applications (VISAPP), 5-8 Jan. 2014 2014, vol. 3, pp. 592-599.
[12] "OpenCV-Python." [Online]. Available: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html.
[13] P. J. Besl and N. D. McKay, "A method for registration of 3-D shapes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239-256, 1992, doi: 10.1109/34.121791.
[14] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek, "The Trimmed Iterative Closest Point algorithm," in Object recognition supported by user interaction for service robots, 11-15 Aug. 2002 2002, vol. 3, pp. 545-548 vol.3, doi: 10.1109/ICPR.2002.1047997.
[15] G. Guy, R. Marc, and B. Rejean, "Three-dimensional registration using range and intensity information," in Proc.SPIE, 1994, vol. 2350, doi: 10.1117/12.189139. [Online]. Available: https://doi.org/10.1117/12.189139
[16] "A Comprehensive Guide to Convolutional Neural Networks." [Online]. Available: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53.
[17] 黃健堯, "逆向工程之點雲配對研究," 碩士, 機械工程系, 國立雲林科技大學, 雲林縣, 2014. [Online]. Available: https://hdl.handle.net/11296/9tye3w
[18] H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and W. Stuetzle, "Surface Reconstruction from Unorganized Points," SIGGRAPH Comput. Graph., vol. 26, no. 2, pp. 71–78, 1992, doi: 10.1145/142920.134011.
[19] 陳柏豪, "使用彩色深度感測器之物體模型重建系統與高解析度紋理貼圖," 碩士, 資訊科學與工程研究所, 國立交通大學, 新竹市, 2016. [Online]. Available: https://hdl.handle.net/11296/rv6g7w
[20] K. Pulli, "Multiview registration for large data sets," in Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062), 8-8 Oct. 1999 1999, pp. 160-168, doi: 10.1109/IM.1999.805346.
[21] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017, doi: 10.1109/TPAMI.2016.2577031.