簡易檢索 / 詳目顯示

研究生: 趙偉佑
Wei-Yu Chao
論文名稱: 一個基於深度學習的凌亂商品影像之自動物件分類方法
An Automatic Object Classification Method for Cluttered Product Images Based on Deep Learning
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 鍾斌賢
Bin-Shyan Jong
陳彥霖
Yen-Lin Chen
陳怡伶
Yi-Ling Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 50
中文關鍵詞: 多物件分類深度學習凌亂商品辨識自助結帳自動結帳自動編碼器
外文關鍵詞: Multi-object classification, Deep learning, Cluttered product recognition, Self-checkout, Automatic checkout, Autoencoder
相關次數: 點閱:355下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

當前的大多數結帳系統都依靠條形碼、RFID標籤或附在商品上的QR碼來區分產品,本研究藉由內置有單個攝像頭的結帳台影像與深度學習模組學習商品特徵以辨識商品,透過這種電腦視覺技術取代現有的方式進行結帳,可以檢測多種沒有任何實質標籤的產品。在自動結帳的應用上,主要的挑戰來自產品的種類繁多,以及產品的日益更新,反映出要蒐集實際結賬的圖像來做訓練困難度,此外,顧客在結帳時,隨意放置商品,也會造成各個商品互相重疊、遮蔽的問題。
為了解決上述問題,本方法主要分為兩個部分:第一部分使用了顯著性目標偵測從單產品示例圖中擷取出商品的影像,藉由擷取出的商品影像合成出結帳圖,再經由我們的自動編碼器,優化我們合成的結帳圖;第二部分則將第一部分的結果用來訓練我們的深度學習模型,其採用Cascade R-CNN為基礎架構,另外再搭配特徵金字塔網路加強對不同尺寸的物體進行識別,以此進行多物件分類。
在本研究中,我們以Retail Product Checkout (RPC) 數據集進行的實驗,與RPC提供的基準方法獲得的分類準確率56.68%相比,我們的方法得到了86.39%的分類準確性;此外,當新口品的產品推出時,我們可以使用我們的方法生成混亂的產品圖像,以微調我們的模型,使得這些新產品的平均分類率達到98.8%, 如此的分類性能表明我們所提的方法在自助結帳的應用上是可行的。


Most current checkout systems rely on barcodes, RFID tags, or QR codes attached to products for recognition. In this study, the products are identified by using the image captured with a single camera mounted on the top of a checkout table, and using deep learning modules to learn the characteristics of the products. Using computer vision technology to replace the existing methods of checkout, it is possible to detect a variety of products without any physical labels. In the application of automatic checkout, the main challenge comes from the variety of products and the frequent updates of products. They reflect the difficulty of collecting actual checkout images for training. In addition, when customers randomly place products during checkout, it will also cause the problems of overlapping and obscuring products.
In order to solve the above problems, our proposed method consists of two phases. The first phase uses salient object detection to segment the image of a product from a single product exemplar image. The cluttered product image is synthesized by pasting segmented product images, then it is enhanced through our autoencoder. The second phase uses the outcomes of the first phase to train our deep learning model that adopts Cascade R-CNN as the basic architecture. Besides, the feature pyramid network is employed to strengthen the recognition of different sized objects, so as to perform multi-object classification.
In this study, we conduct experiments using the Retail Product Checkout (RPC) dataset. Compared with the classification accuracy rate of 56.68% obtained by the method provided by RPC, our proposed method achieves the classification accuracy rate of 86.39%. Additionally, when new product taste is released, we can use our method to generate cluttered product images for fine-tuning our model and average classification rate of those products reaches 98.80%. Such performance manifests that our method is feasible in the application of self-checkout.

中文摘要 i Abstract ii 致謝 iii Contents iv List of Figures vi List of Tables viii Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 2 1.3 System Description 3 1.4 Organization of Thesis 4 Chapter 2 Related Work 5 2.1 Other Product Recognition 5 2.2 Object Detection 6 Chapter 3 Image Preprocessing 9 3.1 Product Segmentation 9 3.2 Cluttered Product Image Synthesizing 11 3.2.1 Cluttered product image generating from product partial images 12 3.2.2 Shadow-working treatment 13 3.3 Autoencoder 14 Chapter 4 Product Detection Methodology 19 4.1 Image Augmentation 19 4.2 Product Detection 23 4.3 Overlapping Detection Removal 26 Chapter 5 Experimental Results and Discussions 28 5.1 Experimental Setup 28 5.1.1 Dataset Introduction 28 5.1.2 Development Environment Setup 30 5.2 Evaluation protocol 30 5.3 Experimental results 32 Chapter 6 Conclusions and Future Work 38 6.1 Conclusions 38 6.2 Future Work 39 References 40

[1] X. S. Wei et al., “RPC: A large-scale retail product checkout dataset,” arXiv preprint arXiv:1901.07249, 2019.
[2] J. Y. Zhu et al., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 2223-2232, 2017.
[3] S. Ren et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proceedings of the Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, pp. 91-99, 2015.
[4] C. Li et al., “Data priming network for automatic check-out,” in Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, pp. 2152- 2160, 2019.
[5] B. F. Wu et al., “An intelligent self-checkout system for smart retail,” in Proceedings of the International Conference on System Science and Engineering, Nantou, Taiwan, pp. 1-4, 2016.
[6] B. T. Wu et al., “Image recognition approach for expediting Chinese cafeteria checkout process,” in Proceedings of the IEEE 2nd Global Conference on Life Sciences and Technologies, Kyoto, Japan, pp. 35-38, 2020.
[7] R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 580-587, 2014.
[8] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1440-1448, 2015.
[9] Q. Hou et al., “Deeply supervised salient object detection with short connections,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, pp. 3203-3212, 2017.
[10] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1395- 1403, 2015.
[11] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
40
[12] K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 770-778, 2016.
[13] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234-241, 2015. doi: 10.1007/978-3-319-24574-4_28
[14] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 6154-6162, 2018.
[15] T. Y. Lin et al., “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, pp. 2117-2125, 2017.

無法下載圖示 全文公開日期 2025/08/11 (校內網路)
全文公開日期 2030/08/11 (校外網路)
全文公開日期 2030/08/11 (國家圖書館:臺灣博碩士論文系統)
QR CODE