基於 YOLOv4 與孿生網路之智慧無人商店辨識系統｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃品翔 Pin-Siang Huang
論文名稱：	基於 YOLOv4 與孿生網路之智慧無人商店辨識系統 Intelligent Unmanned Store Identification Systems Based on YOLOv4 and Siamese Network
指導教授：	洪西進 Shi-Jin Hung
口試委員:	楊竹星 JU-SHING YANG 李正吉 JENG-JI LI 洪西進 JU-SHING YANG 楊昌彪 CHANG-BIAU YANG 林韋宏 WEI-HUNG LIN
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	45
中文關鍵詞：	智慧無人商店、孿生網路、拿了就走、深度學習、防盜系統
外文關鍵詞：	Smart Unmanned Store, Siamese Network, take it and go, Deep learning, Anti-theft system
相關次數：	點閱：152 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著科技日新月異，從過去的人力，漸漸的朝向無人化、數位化發展，到越來越多廠商願意花錢投資無人商店，比如台灣的 7-ELEVEn 推出的『XStore』; 中國阿里巴巴集團旗下的『Tao Café 淘寶會員店』; 或是美國 Amazon 主打的拿了就走『Amazon Go』。因為良好的辨識效果、較高的準確率，加上部分模型可以在即時的情況下進行商品的辨識，因此深度學習在無人商店的領域中被廣泛使用。
本論文提出的無人商店架構採用了多個深度學習的模型，解決了無人商店經常碰到的問題。首先是商品的擴充問題，對於人類而言，可以很容易地識別出新的商品，然而對於深度學習模型卻需要大量的訓練樣本，並且將整個模型重新去做訓練，相當的曠日費時，因此提出了一個利用孿生網路結合深度學習可直接辨識新商品的方法，可以再不用重新訓練模型的情況下，對新商品進行辨識。接下來是商店的營運成本，希望以最少的成本取得最大的效益。本論文僅使用了五隻攝影機當作感測器，打造一個低成本、高準確率的無人商店；在空間利用上，也提出了一個計算堆疊商品的方法，使得空間的建置能夠達到最有效的利用且符合經濟效益的智慧無人商店。最後，在商店的防盜方面，提出了的一個架構，無論是在商店的任何角度，都可以偵測出可能的竊盜行為，防止無人商店不必要的財務損失。
本論文並沒有使用到任何高昂的儀器，卻和一般建置高昂的無人商店一樣擁有相同的功能。

Due to the fast-paced nature of technology, the labor orientation of the past has gradually moved towards unmanned and digital development. More and more manufacturers are willing to spend money to invest in unmanned stores, such as Taiwan’s 7ELEVEn’s "XStore"; China’s Alibaba Group’s "Tao" Café Taobao Member Store"; or United States’s "Amazon Go" Because deep learning has a good recognition effect, high accuracy, and some models can identify products in real time, it is widely used in the field of unmanned stores. The architecture of this paper uses multiple deep learning models to solve the problems often encountered in unmanned stores.
The first is the expansion of products. For new products, human shop assistants can easily identify new products. However, for deep learning models, a large number of training samples are required, and the entire model is retrained, which is time-consuming and time-consuming. Next is the operating cost of the store. Using the twin network combined with the deep learning model, a method that can directly identify new products is proposed, which can achieve the greatest benefit with the least cost. In this paper, only five cameras are used as sensors to create a low-cost, high-accuracy unmanned store; in the use of space, a method for calculating stacked goods is also proposed, so that the space can be built to achieve the most effective A smart unmanned store that can be utilized and economically efficient.
Finally, in terms of store anti-theft, a new architecture was proposed, which can detect possible theft from any angle of the store and prevent unnecessary financial losses in unmanned stores. This paper does not use any expensive equipment, but it has the same functions as a general unmanned store with cheaper construction.

論文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . …. III
Abstract ........................................................................................................................... IV
誌謝 .................................................................................................................................. V
論文創新聲明 ................................................................................................................ VI
目錄 ............................................................................................................................... VII
圖目錄 ............................................................................................................................. IX
表目錄 ............................................................................................................................. XI
緒論 .............................................................................................................................. 1
1 相關研究 ............................................................................................................ 2
2 論文章節安排 .................................................................................................... 3
環境架構 ...................................................................................................................... 4
1 環境架構 ............................................................................................................ 4
2 無人商店場域介紹 ............................................................................................ 5
3 購物流程 ............................................................................................................ 9
深度學習模型 ............................................................................................................ 10
1 You Look Only Once Version 4 ....................................................................... 10
1.1 物件辨識介紹 ..................................................................................... 10
1.2 資料蒐集 ............................................................................................. 15
1.3 Data Augmentation .............................................................................. 18
1.4 training loss .......................................................................................... 21
2 孿生網路 .......................................................................................................... 23
2.1 孿生網路介紹 ..................................................................................... 23
2.2 資料蒐集 ............................................................................................. 24
2.3 孿生網路的損失函數 (Training Loss of Siamese Network) ...... 22
2.4 孿生網路訓練結果
3 Person Re-Identification 行人重識別 ...................................................... 27
3.1 行人重識別介紹 ................................................................................. 27
3.2 架構介紹 ............................................................................................. 27
3.3 訓練結果 ............................................................................................. 29
4 OpenPose 人體姿態預估模型........................................................................ 30
實驗設計 .................................................................................................................... 31
1 無人商店觸發事件判定 .................................................................................. 31
2 無人商店事件分析 .......................................................................................... 32
3 基於 openpose、yolov4 及孿生網路之顧客判定兼防盜系統 ............................ 33
3.1 系統架構 ............................................................................................. 33
4 基於 YOLO v4 及孿生網路新增商品辨識系統 .................................................. 35
4.1 系統架構 ............................................................................................. 35
5 基於 YOLO v4 堆疊與高度遮擋商品辨識系統 .................................................. 37
實驗結果與分析 ........................................................................................................ 40
結論 ............................................................................................................................ 41
參考文獻 ......................................................................................................................... 42
授權書 ............................................................................................................................. 45
                                

[1]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[2]J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271, 2017.
[3]J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[4]A. Bochkovskiy, C.Y. Wang, and H.Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[5]W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016.
[6]S. Kido, Y. Hirano, and N. Hashimoto, “Detection and classification of lung abnormalities by use of convolutional neural network (cnn) and regions with cnn features (rcnn),” in 2018 International workshop on advanced image technology (IWAIT), pp. 1–4, IEEE, 2018.
[7]R. Girshick, “Fast rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[8]S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” arXiv preprint arXiv:1506.01497, 2015. [9]G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for oneshot image recognition,” in ICML deep learning workshop, vol. 2, Lille, 2015.
[10]Z. Cao, G. Hidalgo, T. Simon, S.E. Wei, and Y. Sheikh, “Openpose: Realtime multiperson 2d pose estimation using part affinity fields,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 172–186, 2019.
[11]T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
[12]A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: Vgg and residual architectures,” Frontiers in neuroscience, vol. 13, p. 95, 2019.
[13] Hosang, Jan, Rodrigo Benenson, and Bernt Schiele. “Learning non- maximum suppression.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[14] Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. CSPNet: A new backbone that can enhance learning capability of cnn. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPR Workshop), 2020. 2, 7
[15] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014
[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed14 ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
[17]YOLO v4 與其他模型比較 : https://www.kaggle.com/c/global-wheat-detection/discussion/165414
[18] Zhang, Zijun. “Improved adam optimizer for deep neural networks.” 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). IEEE, 2018.
[19] Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou, “Learning discriminative features with multiple granularities for person re-identification,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 274–282.
[20] Hao Chen, Benoit Lagadec, and Francois Bremond, “Learning discriminative and generalizable representations by spatialchannel partition for person re-identification,” in The IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 2483–2492.
[21] Herzog, Fabian, et al. “Lightweight Multi-Branch Network for Person Re-Identification.” arXiv preprint arXiv:2101.10774 (2021).
[22] Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, and Tao Xiang, “Omni-scale feature learning for person re-identification,” in Proceedings of the IEEE Internation
[23] Rodolfo Quispe and Helio Pedrini, “Top-db-net: Top dropblock for activation enhancement in person re-identification,” arXiv preprint arXiv:2010.05435, 2020.
[24] Rodolfo Quispe and Helio Pedrini, “Top-db-net: Top dropblock for activation enhancement in person re-identification,” arXiv preprint arXiv:2010.05435, 2020.
[25] Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R Scott, “Multi-similarity loss with general pair weighting for deep metric learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5022–5030
[26] Schroff, Florian, Dmitry Kalenichenko, and James Philbin. “Facenet: A unified embedding for face recognition and clustering.” Proceedings of the IEEE conference oncomputer vision and pattern recognition. 2015.
[27] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916.

簡易檢索 / 詳目顯示

相關論文