研究生: |
吳佳霖 Chia-Lin Wu |
---|---|
論文名稱: |
自動化取像系統研製與深度學習之商品辨識應用 Implementation of Automated Imaging System and Applications for Deep Learning Based Product Recognition |
指導教授: |
蔡明忠
Ming-Jong Tsai |
口試委員: |
李振豪
Chen-Hao Li 郭永麟 Yong-Lin Kuo 李敏凡 Min-Fan Lee |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 自動化及控制研究所 Graduate Institute of Automation and Control |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 109 |
中文關鍵詞: | 深度學習 、影像處理 、取像機構 、物件辨識 、YOLO |
外文關鍵詞: | Deep Learning, Image Processing, Imaging System, Object Recognition, YOLO |
相關次數: | 點閱:477 下載:13 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,影像辨識與深度學習逐漸應用於物流業、零售業等等領域,尤其是隨著服務的需求提高,將其導入市場來協助店鋪營運與提升消費者消費的便利性,而這些可節省人力之影像辨識應用也是未來發展的趨勢。本研究以製作自動翻轉取像機構並將影像處理與深度學習應用於商品辨識。其分為三個階段:第一階段為自動翻轉取像系統,在此階段製作出可自動旋轉、翻轉商品並拍攝之機構,而取像商品則選用不同外觀特徵,如形狀、顏色等等進行取像,以5類19個商品的不同姿勢拍攝約可得到20,392張影像;第二階段為影像前處理,將第一階段取得之所有影像進行影像前處理,可自動得到含有商品在影像中位置資訊的標記檔,內容包括商品標籤名稱之索引及商品相對於影像之位置比例,每處理一張並產生標記檔約需6.5秒;第三階段為進行兩版本YOLO演算法來訓練與測試商品辨識結果,發現在前面階段得到的單一影像集中在加入少量手動標註的複雜影像,即可使YOLOv4訓練模型的測試集mAP達到95.44%。根據實驗結果可知,本論文所設計之自動翻轉取像機構可以節省收集訓練資料影像集的人力及減少時間成本,並且還可使訓練模型對商品的辨識精確度有良好的影像辨識效果。
In recent years, image recognition and deep learning has gradually been applied to the logistics industry, and retail industry etc. Especially as the demand for services increases, it is introduced into market to assist store operations and improve the convenience of consumer consumption. Manpower-saving image recognition applications are also the trend of future development. This research is to make an automatic flip imaging mechanism. Then, image processing and deep learning are applied on product recognition. It is divided into three stages: The first stage is an automatic flip imaging system, in which a product can be automatically rotated, flipped and photographed. The sampled products are imaged with different appearance features, such as shape, color, etc. Approximate 20,392 images can be obtained by shooting 19 samples from 5 kinds of products with different posture. The second stage is image pre-processing for images obtained in the first stage. All the images can be automatically labelled. The position information of the product, including the index of the product label name and the position ratio of the product relative to the image will be recorded. Each processed one will generate a label file. It takes 6.5 seconds for each image. The third stage is to train two versions of YOLO algorithms and test the product identification results. It is found that the single image obtained in the previous stage is concentrated in a small number of manually annotated complex images, which can make the mAP of test set under the Yolov4 training model to 95.44 %. According to the experimental results, the automatic flip imaging system designed in this thesis can save the labor and reduce time cost of collecting training data image sets. It can also make the training model have a good image recognition effect on the accuracy of product recognition.
[1] J. Krueger, J. Lehr, M. Schlueter, and N. Bischoff, "Deep learning for part identification based on inherent features," CIRP Annals, vol. 68, no. 1, pp. 9-12, Apr. 2019, DOI: 10.1016/j.cirp.2019.04.095.
[2] R. Gai, N. Chen, and H. Yuan, "A detection algorithm for cherry fruits based on the improved YOLO-v4 model," Neural Computing and Applications, pp. 1-12, May. 2021, DOI: 10.1007/s00521-021-06029-z.
[3] A. Koirala, K. Walsh, Z. Wang, and C. McCarthy, "Deep learning for real-time fruit detection and orchard fruit load estimation: Benchmarking of MangoYOLO," Precision Agriculture, vol. 20, no. 6, pp. 1107-1135, Feb. 2019, DOI: 10.1007/s11119-019-09642-0.
[4] G. Liu, J.C. Nouaze, P.L. Touko Mbouembe, and J.H. Kim, "YOLO-tomato: A robust algorithm for tomato detection based on YOLOv3," Sensors, vol. 20, no. 7, pp. 2145, Apr. 2020, DOI: 10.3390/s20072145.
[5] Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, and Z. Liang, "Apple detection during different growth stages in orchards using the improved YOLO-V3 model," Computers and Electronics in Agriculture, vol. 157, pp. 417-426, Feb. 2019, DOI:10.1016/j.compag.2019.01.012.
[6] A. Franco, D. Maltoni, and S. Papi, "Grocery product detection and recognition", Expert Systems with Applications, vol. 81, pp. 163-176, Sep. 2017, DOI: 10.1016/j.eswa.2017.02.050.
[7] M. Marder, S. Harary, A. Ribak, Y. Tzur, S. Alpert, and A. Tzadok, "Using image analytics to monitor retail store shelves," IBM Journal of Research and Development, vol. 59, no. 2/3, pp. 3:1-3:11, Apr. 2015. DOI: 10.1147/JRD.2015.2394513.
[8] A. Tonioni, E. Serra, and L. Di Stefano, "A deep learning pipeline for product recognition on store shelves." in 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Sophia Antipolis, France, 12-14 Dec. 2018, pp. 25-31. New York,USA: IEEE.
[9] B.-F. Wu, W.-J. Tseng, Y.-S. Chen, S.-J. Yao, and P.-J. Chang, "An intelligent self-checkout system for smart retail". in 2016 International Conference on System Science and Engineering (ICSSE), Puli, Taiwan, 7-9 Jul. 2016, pp.1-4. New York, USA: IEEE.
[10] R.C. Gonzalez and R.E. Woods, "Digital Image Processing (3rd Edition)." USA: Prentice-Hall, Inc, 2006.
[11] S. Kolkur, D. Kalbande, P. Shimpi, C. Bapat, and J. Jatakia, "Human skin detection using RGB, HSV and YCbCr color models," in Proceeding of the International Conference on Communication and Signal Processing 2016, Melmaruvathur, India, 26-27 Dec. 2016, pp.324-332. Netherland: Atlanis Press
[12] L. Shuhua and G. Gaizhi, "The application of improved HSV color space model in image processing". in 2010 2nd International Conference on Future Computer and Communication, Wuhan, China, 21-24 May 2010, pp.v2-10-v2-13. US: IEEE.
[13] M. Goyal, "Morphological image processing," IJCST, vol. 2, no. 4, pp.161-165, Dec 2011.
[14] X. Zhang, "Gaussian Distribution", in Encyclopedia of Machine Learning, C. Sammut and G.I. Webb, Ed. US: Springer, 2010, p. 425-428
[15] J. Canny, "A computational approach to edge detection," IEEE Transactions on pattern Analysis and Machine Intelligence, no. 6, pp. 679-698, 1986.
[16] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, "Deep learning," USA: MIT Press, 2016.
[17] J.R. Searle, "Minds, brains, and programs," Behavoral and Brain Sciences, pp. 417-424, Sep. 1980. DOI:10.1017/S0140525X00005756.
[18] M. Mohri, A. Rostamizadeh, and A. Talwalkar, "Foundations of machine learning". 2018: MIT press.
[19] K. O' Shea and R. Nash, "An introduction to convolutional neural networks" Nov. 2015.
[20] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation." in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23-28 Jun. 2014, pp.580-587. USA: IEEE.
[21] R. Girshick, "Fast R-CNN." in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7-13 Dec. 2015. USA: IEEE.
[22] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN." in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22-29 Oct. 2017, pp.2980-2988. USA: IEEE.
[23] A. Bochkovskiy, C.-Y. Wang, and H.-Y.M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection." pp.1-17. 2020.
[24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection." in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27-30 Jun. 2016, pp.779-788. USA: IEEE.
[25] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger." in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21-26 Jul. 2017, pp.6517-6525. USA: IEEE.
[26] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement." pp.1-6, Apr. 2018.
[27] C. Goutte and E. Gaussier, "A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation." in Proceeding of the 27th European Conference on Advances in Information Retrival Research, Santiago de Compostela, Spain, 21-23 Mar. 2005, pp.345-359. Springer: Germany.
[28] M. Everingham, L. Van Gool, C.K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010, DOI: 10.1007/s11263-009-0275-4.
[29] P. Henderson and V. Ferrari, "End-to-End Training of Object Class Detectors for Mean Average Precision." in Asian Conference on Computer Vision – ACCV 2016, Taipei, Taiwan, 20-24 Nov. 2016, pp. Cham: Springer.
[30] F. van Beers, A. Lindström, E. Okafor, and M.A. Wiering, "Deep Neural Networks with Intersection over Union Loss for Binary Image Segmentation". in Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - ICPRAM, Prague, Czech Republic, 2019, pp.438-445.
[31] N. Bodla, B. Singh, R. Chellappa, and L.S. Davis, "Soft-NMS--improving object detection with one line of code." in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy , 22-29 Oct. 2017, pp.5561-5569. USA: IEEE
[32] Y. Tian, G. Yang, Z. Wang, E. Li, and Z. Liang, "Detection of apple lesions in orchards based on deep learning methods of cyclegan and yolov3-dense," Journal of Sensors, vol. 2019, pp.1-14. Apr. 2019, DOI: 10.1155/2019/7630926.
[33] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection". in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21-26 July 2017, pp.936-944. USA: IEEE.
[34] C. Wang, Q. Luo, X. Chen, B. Yi, and H. Wang, "Citrus recognition based on YOLOv4 neural network." in 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Nachang, China, 26-28 Mar. 2021, pp.564-569. USA: IEEE.
[35] D. Misra, "Mish: A self regularized non-monotonic activation function," 2019, pp.1-14.
[36] G. Ghiasi, T.-Y. Lin, and Q.V. Le, "Dropblock: A regularization method for convolutional networks," 2018, pp.1-11.
[37] Z. Huang, J. Wang, X. Fu, T. Yu, Y. Guo, and R. Wang, "DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection," Information Sciences, vol. 522, pp. 241-258, Jun. 2020. DOI: 10.1016/j.ins.2020.02.067.
[38] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation." in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18-23 Jun. 2018, pp.8759-8767. USA: IEEE.
[39] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU loss: Faster and better learning for bounding box regression." Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, Apr. 2020 pp. 12993-13000. DOI: 10.1609/aaai.v34i07.6999.
[40] 曹永忠、許智誠、蔡英德, "Arduino 程式教學(顯示模組篇)", 2016, Available:http
://ebooks.lib.ntu.edu.tw/1_file/ntulib/106031578/78-%E9%A1%AF%E7%A4%BA%
E6%A8%A1%E7%B5%84%E6%95%99%E5%AD%B8V10%E5%AE%8C%E6%95%B4%E7%89%88.pdf.
[41] SM-42HB34F08AB, Available:https://static.chipdip.ru/lib/879/DOC003879911.pdf.
[42] HMS-23KMK346P1V, Available:https://shop.cpu.com.tw/product/56816/pdf2/.
[43] A4988, Available:https://www.pololu.com/file/0J450/a4988_DMOS_microstepping_
driver_with_translator.pdf.
[44] Arduino CNC Shield, Available:https://www.openimpulse.com/blog/wp-content/uplo
ads/wpsc/downloadables/Arduino-CNC-Shield.pdf.
[45] Logitech_C922_Pro_Stream, Available:https://www.logitech.com/zh-tw/products/we
bcams/c922-pro-stream-webcam.960-001091.html.
[46] P. Desai、CAVEDU團隊、曾吉弘, "Python x Arduino 物聯網整合開發實戰", 碁峰資訊, 2016.
[47] G. Bradski and A. Kaehler, "Learning OpenCV: Computer vision with the OpenCV library". 2008: " O'Reilly Media, Inc.".