簡易檢索 / 詳目顯示

研究生: 林紹瑜
Shao-Yu Lin
論文名稱: 基於深度學習技術的結帳台俯視影像之零售商品分類
A Deep-Learning-Based Approach to Classifying Retail Products in the Top-View Images of a Checkout Counter
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 王榮華
李建德
謝仁偉
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 77
中文關鍵詞: 深度學習資料擴增影像風格轉換物件辨識零售商品分類
外文關鍵詞: deep learning, data augmentation, image style transfer, object detection, retail product classification
相關次數: 點閱:176下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著影像處理器與攝影機功能不斷的提升,使得電腦視覺的技術越來越發達,透過深度學習,我們得以使電腦自動辨識出影像中的物件,也因此,關於經由電腦視覺技術的自動結帳研究,也越來越發達。現今的自助結帳方式,主要是夠過消費者自己掃描商品上的條碼或是RFID標籤來達成,本研究的目的,是希望只需透過單一的頂視影像,就能夠分辨出消費者所購買的品項與數量。不過,訓練此種物件辨識系統最主要的挑戰為,要拍攝出眾多的訓練用影像是一件非常繁重的工作,也可能因為拍攝的角度不同,使得訓練的效果不好,如果能藉由自動化的方式生成訓練用影像,即可減少拍攝與標註訓練用影像的負擔。
    為了解決上述的問題,我們需要執行兩個步驟來進行資料擴增,才能使用物件辨識模型進行訓練。第一步,透過顯著目標偵測模型,將單商品影像中的物件輪廓擷取出來,用來合成模仿真實情況的多品項影像,不過如果要使用這些影像來訓練辨識模型,會因為背景光線的不同,與物件的陰影效果的差異,導致物件辨識的準確度不足,所以,接下來還要透過第二步,使用影像風格轉換的模型,將合成影像的背景與陰影差異,轉換到與實際拍攝的影像相似,最後,我們使用這些擴增的影像進行物件辨識模型的訓練,以提高該模型的物件辨識準確度。
    在實驗中,本研究使用Retail Product Checkout (RPC) 的資料集來測試所建立的Cascade R-CNN商品辨識模型,在三種不同的訓練情境可得知,部分實境影像加上大量的合成影像,可以使物件辨識的準確度達到相當高,為95.7%,另外,測試目前最先進的image-to-image translation模型,並無法取得更好的物件辨識結果,只獲得95.2%的平均準確度;實驗最後得知,更換場景對於物件辨識的效果會有很大的影響,其準確度會下降到63.0%。


    In recent years, with the continuous improvement of the functions of image processors and cameras, computer vision technology has become more and more developed. We can make the computer automatically identify objects in an image through deep learning. Therefore, research on automatic checkout using computer vision technology is also increasingly developed. Today’s self-checkout methods are mainly achieved by consumers scanning barcodes or RFID tags on products. This study aims to use the synthetic image to improve the performance of identifying the items and quantities purchased by consumers only through a single top-view image. The main challenge of training such an object recognition system is that it is a burdensome task to shoot many training images. The training effect may not be good because of various shooting angles. If training images can be generated in an automated way, then we can reduce the burden of shooting and labeling training images.
    To solve the above problems, we need to perform data augmentation in two steps before we can use the object recognition model for training. In the first step, through the salient target detection model, the object's outline in the single product image is extracted and used to synthesize multi-item images that imitate the actual situation. However, suppose these images are used for training the object recognition model. In that case, the differences in the light condition and shadow effect will make the accuracy of object recognition insufficient. Therefore, the second step is to use the image style transfer model to convert the background and shadow differences of the synthetic image to be similar to those of the actual captured image. Finally, the object recognition detection model is trained by the augmented images to improve the accuracy of the model.
    In the experiment, this study uses the Retail Product Checkout (RPC) dataset to test the established Cascade R-CNN product detection model. In three different training scenarios, it can be seen that some actual images plus a large number of synthetic images can make the object recognition accuracy quite high, reaching 95.7%. In addition, the current state-of-the-art image-to-image translation models can not obtain better recognition results, only the precision of 95.2% on average, than ours. Finally, the experiment reveals that changing the scene will significantly impact the effect of object recognition, and its average precision will drop to 63.0%.

    Chapter 1 Introduction ..................................................... 1 1.1 Overview................................................................ 1 1.2 Motivation.............................................................. 2 1.3 System Description ..................................................... 3 1.4 Thesis Organization .................................................... 4 Chapter 2 Related Work ..................................................... 5 2.1 Unmanned Store.......................................................... 5 2.2 Product Images Datasets ................................................ 7 2.3 Object Detection ....................................................... 9 2.3.1 Two-stage detection model............................................. 10 2.3.2 One-stage detection model ............................................ 12 2.4 Object Classification................................................... 13 2.5 Unsupervised (Unpaired) Image-to-Image Translation ..................... 14 Chapter 3 Dataset and Preprocessing......................................... 18 3.1 Dataset................................................................. 18 3.2 Retail Product Image Synthesizing....................................... 21 Chapter 4 Our Proposed Retail Product Detection Method ..................... 26 4.1 Image-to-Image (I2I) Translation ....................................... 26 4.2 Data Augmentation ...................................................... 26 4.2.1 Rotation, Scaling & Translation ...................................... 26 4.2.2 Color Shifting ....................................................... 27 4.3 Choosing Detection Model ............................................... 29 Chapter 5 Experimental Results and Discussion............................... 33 5.1 Experimental Environment Setup ......................................... 33 5.2 Evaluation Metrics ..................................................... 34 5.2.1 Checkout accuracy (cAcc).............................................. 34 5.2.2 Precision ............................................................ 35 5.2.3 Fréchet inception distance (FID) ..................................... 36 5.3 Testing Results......................................................... 37 5.3.1 Experiment I ......................................................... 38 5.3.2 Experiment II......................................................... 40 5.3.3 Experiment III........................................................ 44 5.3.4 Experiment IV ........................................................ 48 Chapter 6 Conclusions & Future Work ........................................ 53 6.1 Conclusions............................................................. 53 6.2 Future Work ............................................................ 53 References.................................................................. 55 Appendix ................................................................... 58

    [1] D. Berthiaume, “Will other grocers follow Amazon Go?” [Online] Available: https://chainstoreage.com/will-other-grocers-follow-amazon-go (Accessed on June 29, 2022).
    [2] T. Sabanoglu, “Number of stores which offer autonomous checkouts worldwide from 2018 to 2024.” [Online] Available: https://www.statista.com/statistics/1033836/ number-of-stores-with-autonomous-checkouts-worldwide/ (Accessed on June 29, 2022).
    [3] D. McCarthy, “The economics of self-service checkouts.” [Online] Available: https://www.oversixty.com.au/finance/money-banking/the-big-problem-with-self-serve-checkouts (Accessed on June 29, 2022).
    [4] G. Goodwin, “Self-checkout criminals on the rise in 2021,” [Online] Available: https://itretail.com/pos-blog/self-checkout-criminals-on-the-rise-in-2021 (Accessed on June 29, 2022).
    [5] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object detection and instance segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1483-1498, 2019.
    [6] S. Xie et al., “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017, pp. 1492-1500.
    [7] 程倚華,“OK便利商店的拓點利器,用智慧販賣機收服阿兵哥、穆斯林的心,” [Online] Available: https://www.bnext.com.tw/article/54660/ok-mart-ok-mini (Accessed on June 29, 2022).
    [8] D. Koubaroulis et al., “Evaluating colour-based object recognition algorithms using the soil-47 database,” in Proceedings of the Asian Conference on Computer Vision, Melbourne, Australia, 2002, vol. 2.
    [9] M. Merler et al., “Recognizing groceries in situ using in vitro training data,” in Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, 2017, pp. 1-8.
    [10] P. Jund et al., “The freiburg groceries dataset,” Nov 2016. [Online] Available: https://arxiv.org/abs/1611.05799# (Accessed on June 29, 2022).
    [11] X. S. Wei et al., “RPC: A large-scale retail product checkout dataset,”. [Online]. Available: https://arxiv.org/abs/1901.07249 (Accessed on June 29, 2022).
    [12] R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, 2014, pp. 580-587.
    [13] J. R. R. Uijlings et al., “Selective search for object recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, 2013.
    [14] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 1440-1448.
    [15] S. Ren et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, vol. 28, 2015.
    [16] J. Redmon et al., “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 779-788.
    [17] W. Liu et al., “SSD: Single shot multibox detector,” in Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21-37.
    [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, 2012.
    [19] C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, 2015, pp. 1-9.
    [20] M. Lin, Q. Chen, and S. Yan, “Network in network,” Mar 2014. [Online] Available: https://arxiv.org/abs/1312.4400 (Accessed on June 29, 2022).
    [21] K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 770-778.
    [22] I. Goodfellow et al., “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.
    [23] J.Y. Zhu et al., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2223-2232.
    [24] P. Isola et al., “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017, pp. 1125-1134.
    [25] Z. Yi et al., “DualGAN: Unsupervised dual learning for image-to-image translation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017, pp. 2849-2857.
    [26] O. Ronneberger et al., “U-Net: Convolutional networks for biomedical image segmentation,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015, pp. 234-241.
    [27] T. Park et al., “Contrastive learning for unpaired image-to-image translation,” in Proceedings of the European Conference on Computer Vision, Scotland, United Kingdom, 2020, pp. 319-345.
    [28] X. Qin et al., “U2-Net: Going deeper with nested U-structure for salient object detection,” Pattern Recognition, vol. 106, 2020.
    [29] T. H. L. Cormen et al., Introduction to Algorithms, 2nd Ed., Cambridge, Massachusetts: The MIT Press, p. 955, 2001.
    [30] T. Y. Lin et al., “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, pp. 2117-2125, 2017.
    [31] M. Heusel et al., “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in Neural Information Processing Systems, vol. 30, 2017.

    無法下載圖示 全文公開日期 2027/07/26 (校內網路)
    全文公開日期 2032/07/26 (校外網路)
    全文公開日期 2032/07/26 (國家圖書館:臺灣博碩士論文系統)
    QR CODE