簡易檢索 / 詳目顯示

研究生: 詹國偉
Kuo-Wei Chan
論文名稱: 利用合成影像資料與Mask RCNN 進行鋼結構構件辨識之研究
A Study on Using Synthetic Image Data and Mask RCNN to Identify Steel Structural Members
指導教授: 謝佑明
Yo-Ming Hsieh
口試委員: 陳鴻銘
Hung-Ming Chen
陳柏華
Albert Y. Chen
學位類別: 碩士
Master
系所名稱: 工程學院 - 營建工程系
Department of Civil and Construction Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 116
中文關鍵詞: Mask RCNN合成資料鋼結構物件辨識
外文關鍵詞: Mask RCNN, Synthetic data, Steel structure, Object identification
相關次數: 點閱:192下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

鋼結構構件吊裝過程中,若產生運輸錯誤構件至施工現場,將導致吊裝流程需要更動,更甚者須延遲幾天的時間,進而影響其他工項之施作。鋼結構於出廠時,通常會由工廠端人員進行檢查構件編號,避免運送錯誤之構件,編號標記主要依靠打鋼印的方式,將構件編號刻寫於翼板處,由於構件在生產完成後會放置於置料區等待營造現場需求再將構件運送至現場,未進行防鏽處理的鋼構件常產生鏽蝕,由於鏽蝕及鋼印過小之因素,較難快速分辨構件編號,因此鋼構廠常用粉筆塗寫的方式取代鋼印,但仍無法避免寫錯編號或是標記被抹除的問題,本研究希望藉由影像辨識之技術,提供鋼構件辨識之方法,降低人為操作之失誤。
本研究基於一深度神經網路Mask RCNN進行鋼構件辨識。神經網路需要大量的訓練資料才足夠準確,然而鋼構件生產至吊裝過程中,較難取得真實圖片作為訓練資料,因此本研究採用合成圖片進行模型訓練,利用生產鋼構件所建立之Tekla模型,擷取鋼構建之編號及幾何資料,並使用Babylon.js進行鋼構件彩現、生產訓練圖片,且幾種域適應及域隨機化方法,被用來提高訓練模型的準確性,並將相似的組件分組以進一步提高識別真實組件的能力。
最後,本研究以一公寓大樓建案之鋼構件評估不同的域適應、域隨機化策略以生成多個合成數據集,並使用mAP(mean Average Precision)來評估模型性能。研究結果發現,Mask RCNN對於不同組合成圖片測試中,mAP出現明顯的差異,低至20.39%,高至94.24%。而目前所採用之域適應、域隨機化策略無法在現實世界圖像上達到令人滿意的精度。


During the hoisting process of steel structural components, transporting wrong components to the construction site will disturb the planned schedule and will even cause unnecessary delay for several days. When steel structure members leave the factory, the factory-side personnel will usually check the component identifier to avoid shipping the wrong components. The identifying mark is often stamped and engraved on the flange plate. Due to factors such as rust and small steel stamps, it is difficult to quickly find these identifiers and read them. This research hopes to provide an automated method for steel component identification through image identification technology to reduce human errors.
This research is based on Mask RCNN, a deep neural network, to identify steel components. Neural networks need large training data to be accurate. However, large quantity real pictures are difficult to obtain because steel structural members are huge. This work studies using synthetic pictures are used for model training. Tekla BIM models from real projects are used to extract the steel components, and Babylon.js is used to generate steel component images for training purposes. Several domain adaptation strategies were used to improve accuracy of the trained model, and similar components were grouped to further improve the ability to identify real components.
Apartment building projects are used to evaluated different domain adaptation strategies to generate several synthetic data sets and mAP (mean average precision) is used to the evaluate model performance. It is seen mAP varies greatly on generated synthetic data sets, it goes as low as 20.39% and as high as 94.24%. Currently, tried domain adaptation and domain randomization strategies currently used cannot achieve satisfactory accuracy on the real-world images.

論文摘要 I Abstract II 誌謝 III 目錄 IV 圖目錄 VII 表目錄 X 第一章 緒論 1 1.1 研究動機與目的 1 1.2 研究流程 2 1.3 論文架構 3 第二章 文獻回顧 5 2.1 深度學習 5 2.2 域差異(Domain gap) 5 2.2.1 域適應(Domain adaptation) 6 2.2.2 域隨機化(Domain randomization) 6 2.3 類神經網路評估方法 6 2.3.1 混淆矩陣 7 2.3.2 準確度 7 2.3.3 精確度 7 2.3.4 召回率 8 2.3.5 信任分數 8 2.3.6 交集聯集比(Intersection over union, IoU) 8 2.3.7 平均精確度及mAP 9 2.4 Mask RCNN 10 2.5 圖片相似度 16 2.5.1 平均平方誤差 16 2.5.2 結構相似性指數 17 第三章 研究方法與工具 19 3.1 合成圖片生產工具 19 3.1.1 Babylon.js 19 3.1.2 Node.js 21 3.1.3 Python圖像模組 21 3.2 深度學習模型訓練工具及方法 22 3.2.1 Docker 22 3.2.2 Tensorflow與Keras 22 3.2.3 Mask RCNN開源模組 22 3.2.4 資料集設定 23 3.3 電腦設備 24 第四章 資料集參數分析 25 4.1 圖片擷取角度分析 25 4.2 圖片解析度分析 26 4.3 圖片縮放演算法分析 30 4.4 分類數分析 32 4.5 小結 33 第五章 相似鋼構件整合 35 5.1 標準層 35 5.2 相異點算法 35 5.3 相似度驗證 35 5.4 小結 42 第六章 系統驗證 43 6.1 案例簡介 43 6.2 合成圖片生產 43 6.3 花費時間 47 6.4 合成圖片驗證 48 6.5 真實圖片驗證 54 6.6 小結 64 第七章 結論與建議 65 7.1 結論 65 7.2 建議 65 參考文獻 67 附錄 73

[1] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning: MIT Press.
[2] LeCun, Y., Haffner, P., Bottou, L., & Bengio, Y. (1999). Object Recognition with Gradient-Based Learning. In D. A. Forsyth, J. L. Mundy, V. di Gesú, & R. Cipolla (Eds.), Shape, Contour and Grouping in Computer Vision (pp. 319-345). Berlin, Heidelberg: Springer Berlin Heidelberg.
[3] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. doi: 10.1038/323533a0
[4] Wang, Z., Tang, C., Sima, X., & Zhang, L. (2021, 14-16 April 2021). Research on Application of Deep Learning Algorithm in Image Classification. Paper presented at the 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC).
[5] Lakkhanawannakun, P., & Noyunsan, C. (2019, 23-26 June 2019). Speech Recognition using Deep Learning. Paper presented at the 2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC).
[6] Fahad, S. K. A., & Yahya, A. E. (2018, 11-12 July 2018). Inflectional Review of Deep Learning on Natural Language Processing. Paper presented at the 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE).
[7] Романюк, В. (2016). Training Data Expansion and Boosting of Convolutional Neural Networks for Reducing the MNIST Dataset Error Rate. Research Bulletin of the National Technical University of Ukraine Kyiv Politechnic Institute, 29-34. doi: 10.20535/1810-0546.2016.6.84115
[8] Cireşan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column Deep Neural Networks for Image Classification.
[9] Kim, H., Kim, H., Hong, Y. W., & Byun, H. (2018). Detecting Construction Equipment Using a Region-Based Fully Convolutional Network and Transfer Learning. Journal of Computing in Civil Engineering, 32(2). doi: 10.1061/(ASCE)CP.1943-5487.0000731
[10] Fang, W., Ding, L., Zhong, B., Love, P. E. D., & Luo, H. (2018). Automated detection of workers and heavy equipment on construction sites: A convolutional neural network approach. Advanced Engineering Informatics, 37, 139-149. doi: 10.1016/j.aei.2018.05.003
[11] Fang, Q., Li, H., Luo, X., Ding, L., Rose, T. M., An, W., & Yu, Y. (2018). A deep learning-based method for detecting non-certified work on construction sites. Advanced Engineering Informatics, 35, 56-68. doi: 10.1016/j.aei.2018.01.001
[12] Luo, X., Li, H., Cao, D., Dai, F., Seo, J., & Lee, S. (2018). Recognizing Diverse Construction Activities in Site Images via Relevance Networks of Construction-Related Objects Detected by Convolutional Neural Networks. Journal of Computing in Civil Engineering, 32(3). doi: 10.1061/(ASCE)CP.1943-5487.0000756
[13] Richardson, E., Sela, M., & Kimmel, R. (2016, 25-28 Oct. 2016). 3D Face Reconstruction by Learning from Synthetic Data. Paper presented at the 2016 Fourth International Conference on 3D Vision (3DV).
[14] Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016, 27-30 June 2016). The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. Paper presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Bochinski, E., Eiselein, V., & Sikora, T. (2016, 23-26 Aug. 2016). Training a convolutional neural network for multi-class object detection using solely virtual world data. Paper presented at the 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).
[16] Hinterstoisser, S., Lepetit, V., Wohlhart, P., & Konolige, K. (2017). On Pre-Trained Image Features and Synthetic Images for Deep Learning.
[17] Tantrapiwat, A. (2021, 1-3 April 2021). Spot Welding Defect Detection Using Synthetic Image Dataset on Convolutional Neural Networks. Paper presented at the 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST).
[18] Liu, R., Yang, C., Sun, W., Wang, X., & Li, H. (2020). StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching. Paper presented at the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[19] Kim, E., Park, K., Yang, H., & Oh, S. Y. (2020). Training deep neural networks with synthetic data for off-road vehicle detection. Paper presented at the International Conference on Control, Automation and Systems.
[20] Huang, X., Liu, M.-Y., Belongie, S., & Kautz, J. (2018). Multimodal Unsupervised Image-to-Image Translation.
[21] Zhu, J., Park, T., Isola, P., & Efros, A. A. (2017, 22-29 Oct. 2017). Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. Paper presented at the 2017 IEEE International Conference on Computer Vision (ICCV).
[22] Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.
[23] Cortés, A., Rodríguez, C., Vélez, G., Barandiarán, J., & Nieto, M. (2020). Analysis of Classifier Training on Synthetic Data for Cross-Domain Datasets. IEEE Transactions on Intelligent Transportation Systems, 1-10. doi: 10.1109/TITS.2020.3009186
[24] Salas, A. J. C., Meza-Lovon, G., Fernández, M. E. L., & Raposo, A. (2020, 7-10 Nov. 2020). Training with synthetic images for object detection and segmentation in real machinery images. Paper presented at the 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).
[25] Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62(1), 77-89. doi: https://doi.org/10.1016/S0034-4257(97)00083-7
[26] Michie, D., Spiegelhalter, D., & Taylor, C. (1999). Machine Learning, Neural and Statistical Classification. Technometrics, 37. doi: 10.2307/1269742
[27] Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation.
[28] Krzanowski, W., Bailey, T., Partridge, D., Fieldsend, J., Everson, R., & Schetinin, V. (2006). Confidence in Classification: A Bayesian Approach. Journal of Classification, 23, 199-220. doi: 10.1007/s00357-006-0013-3
[29] Nowozin, S. (2014, 23-28 June 2014). Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case. Paper presented at the 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[30] Zhang, P., & Su, W. (2012, 29-31 May 2012). Statistical inference on recall, precision and average precision under random selection. Paper presented at the 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.
[31] He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision.
[32] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Paper presented at the Advances in Neural Information Processing Systems.
[33] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Paper presented at the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[34] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Paper presented at the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[35] 張祐菁. (2019). 以遮罩區域卷積神經網路偵測與警示現地安全帽穿戴狀況. (碩士), 國立臺灣科技大學. Retrieved from http://etheses.lib.ntust.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22G0M10605102%22.&searchmode=basic
[36] Markus Andreas, S., & Markus, O. (1995). Similarity of color images. Paper presented at the Proc.SPIE.
[37] Zhou, W., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600-612. doi: 10.1109/TIP.2003.819861
[38] Ionescu, M., & Ralescu, A. (2004). The Impact of Image Partition Granularity Using Fuzzy Hamming Distance as Image Similarity Measure.
[39] 黄嘉恒,李晓伟,陈本辉,杨邓奇. (2017). 基于哈希的图像相似度算法比较研究. 大理大学学报, 2(12), 32-37.
[40] Zhang, Z.-y., Pan, Z.-g., & Zhang Ming, m. (2004, 18-20 Dec. 2004). Spherical harmonic descriptor for gray-level image similarity matching. Paper presented at the Third International Conference on Image and Graphics (ICIG'04).
[41] Li, H., Ma, X., Wan, W., & Zhou, X. (2010, 23-25 Nov. 2010). Image similarity matching retrieval on synergetic neural network. Paper presented at the 2010 International Conference on Audio, Language and Image Processing.
[42] 洪銘輝. (2020). 離散元素分析模擬之雲端運算與視覺化展示. (碩士), 國立台灣科技大學. Retrieved from https://etheses.lib.ntust.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=i2TkIh/record?r1=1&h1=1
[43] Van der Walt, S., Schönberger, J. L., Nunez-Iglesias, J., Boulogne, F., Warner, J. D., Yager, N., . . . Yu, T. (2014). scikit-image: image processing in Python. PeerJ, 2, e453-e453.
[44] Umesh, P. (2012). Image Processing in Python. CSI Communications, 23.
[45] Bradski, G. (2000). The OpenCV Library. Dr. Dobb's Journal of Software Tools.
[46] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2016). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.
[47] Chollet, F., & others. (2015). Keras.
[48] Abdulla, W. (2017). Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow: Github.

QR CODE