簡易檢索 / 詳目顯示

研究生: 施泓仰
Hong-Yang Shih
論文名稱: 一個模型適用所有環境:應用於即時車牌與車輛偵測系統之通用型物件偵測
One Model Fits All: Universal Object Detection for Real-time License Plate and Vehicle Detection
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 賈仲雍
Zhong-Yong Jia
戴文凱
Wen-Kai Tai
洪西進
Shi-Jinn Horng
吳怡樂
Yi-Leh Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 43
中文關鍵詞: 物件偵測深度學習車牌偵測車型偵測邊緣運算
外文關鍵詞: Object Detection, Deep Learning, License Plate Detection, Vehicle Detection, Edge Computing
相關次數: 點閱:255下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Object Detection 在近幾年來已獲得許多的關注,並且有許多優秀的研究成果已經被提出,但鮮少有人關注在真實世界中多樣化的環境所造成的 domain shift,使得Object Detection 模型對於真實環境的泛用程度還有所限制。

    在本文中,我們將提出一種 Universal Object Detection 架構,不僅僅關注於模型面對多樣化環境的泛用程度,也考量了辨識速度與準確度。我們透過自動化車牌與車型辨識系統作為實際落地應用的案例,以驗證我們的系統具備泛用度高、準確度高與辨識速度快的三大特色。我們提出一個新的 universal adapter,將 feature map 透過 clustering 的方法自動分析其中存在的 domain 數量以及分布,接著根據不同的 domain 分布給予不同的特徵注意力,藉此泛化模型的特徵表達,使得模型成為高效率的 Universal Object Detector。

    我們所提出的 Object Detection 模型,在面對真實世界多樣化的環境時,只需要使用同一個模型進行部署與辨識,不僅不需要針對特定場景做重新訓練,也不需要在模型部署時考量環境客觀因素,如氣候、攝影機裝設角度與高度等等,甚至在模型訓練階段也能透過不同環境的知識共享,進而提升模型的準確率。

    對於本文的自動化車牌辨識與車型辨識數據集方面,我們收集了 99,084 張包含不同天候、裝設角度與攝影機架設距離等等龐大且多樣化的數據集進行訓練與測試,並同時進行車牌、紅綠車牌、一般車輛、機車、巴士、小貨車、大貨車、半聯結車與全聯結車九種類別進行辨識,我們的辨識模型達到了 95.31% 的辨識率,在使用 CPU Intel i7­9700 和 GPU RTX 2080 的硬體狀況下能達到每秒 217 張的辨識速度,在系統應用時,可同時辨識八個串流影像達到 79 FPS 的辨識速度,同時,我們在 NVIDIA Jetson AGX Xavier 上可達到 68 FPS 的辨識速度,已可應用於 edge computing。


    Object Detection has received increasing attention and achieved remarkable result in recent year. However, few researches pay attention to the domain shift caused by the various environments in the real world so that the generalization of model is still limited.

    In this thesis we propose a Universal Object Detection architecture that not only focuses on the generalization of the model but also considers the inference time and accuracy. We use Automatic License Plate and Vehicle Detection as a practical application case, and verify our model has high generalization ability, high accuracy and fast inference speed.

    In the proposed model, a single network processes the various environments all the time without training a detector specialized to the environment and considering the objective factor such as weather and camera pose in the system deployment. Moreover, this allows the model to share knowledge across environments, and further improves the accuracy.

    For training and evaluation, we have collected 99,084 images for our Automatic License Plate and Vehicle Detection dataset, including different weather conditions, different regions and camera pose. Our model detected nine classes, ”License Plate ”, ”Green Red License Plate”, ”Car”, ”Bus”, ”Motorcycle”, ”Small Truck”, ”Large Truck”, ”Semi Trailer”, ”Full Trailer”. Experiments show that we achieve an average accuracy rate of 95.31% with 217 FPS recognition speed in an Intel i7­9700 CPU and NVIDIA RTX 2080 GPU computing environment. In practical application, our model can achieve 79 FPS recognition speed with eight streaming inputs. Additionally, we have tested our model in NVIDIA Jetson AGX Xavier and achieve 68 FPS recognition speed, which can be applied to edge computing.

    論文摘要 III ABSTRACT IV 誌謝 V 目錄 VI 圖目錄 VIII 表目錄 X 1 緒論 1 1.1 研究動機和目標 1 1.2 研究方法敘述 1 1.3 研究貢獻 2 1.4 本論文之章節結構 3 2 文獻探討 4 2.1 Object Detection 4 2.2 Domain Generalization 5 2.3 Clustering 6 2.4 Attention Module 7 3 研究方法 8 3.1 Overview 8 3.2 Automatic License Plate and Vehicle Detection 8 3.3 Object Detection 11 3.4 Universal Adapter 14 3.5 Universal Object Detection 18 4 實驗設計 20 4.1 Datasets and Evaluation 20 4.2 Universal Object Detection 21 4.3 Ablation Study 22 4.4 Performance 22 5 實驗結果與分析 23 5.1 Datasets 23 5.2 Universal Object Detection 25 5.3 Ablation Study 27 5.4 Performance 28 6 結論與後續工作 29 6.1 貢獻與結論 29 6.2 未來工作 29 參考文獻 30

    [1] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster r­cnn for object detection in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
    [2] J. Hu, L. Shen, and G. Sun, “Squeeze­and­excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
    [3] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance­iou loss: Faster and better learning for bounding box regression,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12993–13000, Apr. 2020.
    [4] T.­Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in ProceedingsoftheIEEEInternationalConferenceonComputerVision (ICCV), Oct 2017.
    [5] S. Woo, J. Park, J.­Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
    [6] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r­cnn: Towards real­time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
    [7] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real­time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, 2016.
    [8] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525, 2017.
    [9] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” ArXiv, vol. abs/ 1804.02767, 2018.
    [10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.­Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Proceedings of the European Conference
    on Computer Vision (ECCV) (B. Leibe, J. Matas, N. Sebe, and M. Welling, eds.), (Cham), pp. 21–37, Springer International Publishing, 2016.
    [11] M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
    [12] H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
    [13] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
    [14] Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one­stage object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
    [15] C. Zhu, Y. He, and M. Savvides, “Feature selective anchor­free module for singleshot object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
    [16] Y. Pan, A. J. Ma, Y. Gao, J. Wang, and Y. Lin, “Multi­scale adversarial cross­domain detection with robust discriminative learning,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1313–1321, 2020.
    [17] X. Wang, Z. Cai, D. Gao, and N. Vasconcelos, “Towards universal object detection by domain attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
    [18] R. Girshick, “Fast r­cnn,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015.
    [19] T.­Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944, 2017.
    [20] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain­adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
    [21] S. Zhao, M. Gong, T. Liu, H. Fu, and D. Tao, “Domain generalization via entropy regularization,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, eds.), vol. 33, pp. 16096–16107, Curran Associates, Inc., 2020.
    [22] A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, and A. Torralba, “Undoing the damage of dataset bias,” in Proceedings of the 12th European Conference on ComputerVision­VolumePartI, ECCV’12, (Berlin, Heidelberg), p. 158–171, SpringerVerlag, 2012.
    [23] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2962–2971, 2017.
    [24] D. P. Kingma and M. Welling, “Auto­encoding variational bayes,” CoRR, vol. abs/ 1312.6114, 2014.
    [25] S. Mukherjee, H. Asnani, E. Lin, and S. Kannan, “Clustergan: Latent space clustering in generative adversarial networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4610–4617, Jul. 2019.
    [26] N. Ma, X. Zhang, H.­T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
    [27] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
    [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.

    無法下載圖示 全文公開日期 2031/08/30 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE