簡易檢索 / 詳目顯示

研究生: 劉維軒
Wei-Hsuan Liu
論文名稱: 基於深度學習之實時多類別車流計數系統
Real Time Multi-Class Vehicle Counting System based on Deep Learning
指導教授: 戴文凱
Wen-Kai Tai
口試委員: 洪西進
Shi-Jinn Horng
賈仲雍
Chung-Yung Jar
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 101
中文關鍵詞: 深度學習車輛辨識車輛追蹤車流計數
外文關鍵詞: deep learning, vehicle classification, vehicle tracking, vehicle counting
相關次數: 點閱:272下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

在傳統中,車流的計算是以人工計數方式,對經過的每一台車判斷其類別並計數。對於車流的掌握狀況,往往需等待人工計數完畢後,才能對過去該時段的車流狀況進行分析。在日益漸增的車輛數目與運輸需求下,面對龐大的交通數據,勢必消耗更大量的人力、時間與成本對車流進行計數,且無法實時反應當下的車流狀況,立即做出對應決策。我們希望能透過自動化技術,實時對該地區的道路畫面自動進行車流計數,取代傳統的人工計數方式。

本論文提出一套實時多類別車流計數系統,透過四個階段實時對道路畫面中的車輛進行分類並計數。首先,以高效的方式讀取影片,依序輸出影片的連續影像(幀)。然後,基於深度學習方法建構一個輕量的車輛辨識模型,對每一幀中的車輛進行辨識與分類。接著,根據車輛辨識模型的預測結果進行車輛追蹤,判斷連續幀中的車輛是否為同一台車,並提出三種基於預測信心值進行類別判斷的策略,判斷追蹤車輛的類別。最後,創建感興趣區域遮罩進行車流計數,當車輛經過感興趣區域遮罩時,判斷該車的計數時機決定是否計數,並透過車輛座標校正判斷車輛的所在車道。

我們建立了一個包含 99,076 台車的車輛辨識資料集,對車輛辨識模型進行訓練與測試。透過 11 部影格率為 30 FPS 的各種場景之車流影片,建立一個車流計數資料集,對本論文提出的系統進行計數測試。在車輛辨識資料集中,我們的車輛辨識模型擁有 94.1144% 準確度。在車流計數資料集中,透過信心值類別判斷策略進行車種分類擁有 96.3782% 準確度、感興趣區域遮罩於各場景皆擁有 0% 計數重複率,以及整體 94.1289% 絕對準確率、在使用 CPU 為 Ryzen 9 3950X 與 GPU 為 RTX 2080 Ti 的硬體規格下,系統擁有約 47-53 FPS 的平均運行速度。


In the traditional way, the calculation of traffic flow is to manually count the number of cars passed in each vehicle class, it consumes a lot of manpower, time and cost making impossible to reflect the current vehicle flow in real time and decision­making immediately.

In this thesis, we propose a real time multi­class vehicle counting system, which classifies and counts the passing vehicles in real time through four stages. First, the video inputs in an efficient way, and output the continuous images (frames) of the video in sequence. Then, a lightweight vehicle classification model is constructed based on the deep learning method to detect and classify vehicles in each frame. Next, we perform vehicle tracking based on the prediction result of the vehicle classification model to determine whether the vehicles in consecutive frames are the same. We also propose three strategies for classification based on the predicted confidence value to classify the tracking vehicle. Finally, create a region of interest mask to count the vehicle flow. When a vehicle passes through the region of interest mask, the timing of the vehicle determines whether to count, and determines the lane of the vehicle through vehicle coordinate correction.

We have established a vehicle classification dataset containing 99,076 vehicles to train and test the vehicle classification model. Through 11 vehicle flow videos of various scenes with a frame rate of 30 FPS, a vehicle flow counting dataset was established to test the proposed system. In the vehicle classification dataset, our vehicle classification model has an accuracy of 94.1144%. In the vehicle flow counting dataset, the classification of vehicle class through the confidence value classification strategy has an accuracy of 96.3782%, the region of interest mask has a count repetition rate of 0% in each scene, and reach an overall absolute accuracy rate of 94.1289%. Furthermore, our system performs efficiently for 47­53 FPS on average.

論文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI 圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XVI 1 緒論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究背景與動機 . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究目標 . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 研究方法概述 . . . . . . . . . . . . . . . . . . . . . . . . 1 1.4 研究貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.5 本論文之章節結構 . . . . . . . . . . . . . . . . . . . . . . 3 2 文獻探討 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 車輛辨識 . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 基於表象 . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 基於運動 . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 基於物件 . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 物件辨識模型 . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Feature Pyramid Networks . . . . . . . . . . . . . . . . 6 2.2.2 YOLOv3 . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.3 ShuffleNet v2. . . . . . . . . . . . . . . . . . . . . . 9 2.2.4 Squeeze-and-Excitation Block(SE Block) . . . . . . . . . 11 2.2.5 Mish . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 損失函數 . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Focal Loss . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Distance-IoU(DIoU) Loss. . . . . . . . . . . . . . . . . 14 2.4 優化器 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.1 Rectified Adam(RAdam). . . . . . . . . . . . . . . . . . 16 2.4.2 Lookahead.. . . . . . . . . . . . . . . . . . . . . . . 17 2.5 車輛追蹤與計數 . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.1 Simple Online and Realtime Tracking(SORT). . . . . . . . 18 2.5.2 基於 Convolutional Neural Network(CNN) 之車輛計數方法. . 19 3 研究方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 系統架構 . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 影片讀取模組 . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 車輛辨識模型 . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.1 資料標註 . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.2 Anchor Boxes 前處理. . . . . . . . . . . . . . . . . . . 25 3.3.3 模型訓練 . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 多類別車輛追蹤 . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.1 基於 SORT 之多類別車輛追蹤 . . . . . . . . . . . . . . . 29 3.4.2 過濾未完整進入畫面之車輛 . . . . . . . . . . . . . . . . 30 3.4.3 基於預測信心值之追蹤車輛類別判斷 . . . . . . . . . . . . 32 3.5 感興趣區域遮罩計數 . . . . . . . . . . . . . . . . . . . . . 34 3.5.1 創建階段 . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5.2 辨識階段 . . . . . . . . . . . . . . . . . . . . . . . . 35 4 實驗設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 實驗環境 . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 車輛辨識實驗 . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2.1 車輛辨識資料集 . . . . . . . . . . . . . . . . . . . . . 44 4.2.2 車輛辨識模型 . . . . . . . . . . . . . . . . . . . . . . 47 4.2.3 評估模型 . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3 車流計數實驗 . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3.1 車流計數資料集 . . . . . . . . . . . . . . . . . . . . . 48 4.3.2 評估方法 . . . . . . . . . . . . . . . . . . . . . . . . 52 5 實驗結果與分析 . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1 車輛辨識實驗 . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1.1 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1.2 錯誤分析 . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.3 方法分析 . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 車流計數實驗 . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.1 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.2 場景分析 . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2.3 錯誤分析 . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2.4 方法分析 . . . . . . . . . . . . . . . . . . . . . . . . 66 6 結論與未來工作 . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2 未來工作 . . . . . . . . . . . . . . . . . . . . . . . . . . 67 參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 附錄一、各影片之車流計數實驗結果 . . . . . . . . . . . . . . . . . 73

[1] A. Chetouane, S. Mabrouk, I. Jemili, and M. Mosbah, “Vision­based vehicle detection for road traffic congestion classification,” Concurrency and Computation: Practice and Experience, p. e5983, 2020.
[2] T.­Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017.
[3] A. Kathuria, “What’s new in yolo v3?,” Apr 2018. https:// towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b, accessed 2021­06­09.
[4] J. Redmon, “Darknet: Open source neural networks in c.” http://pjreddie.com/ darknet/, accessed 2021­06­09, 2013–2016.
[5] N. Ma, X. Zhang, H.­T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient CNN architecture design,” in Proceedings of the European Conference on Computer Vision, pp. 116–131, 2018.
[6] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258, 2017.
[7] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.­C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018.
[8] J. Hu, L. Shen, and G. Sun, “Squeeze­and­excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2018.
[9] D. Misra, “Mish: A self regularized non­monotonic activation function,” arXiv preprint arXiv:1908.08681, 2019.
[10] T.­Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988, 2017.
[11] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance­iou loss: Faster and better learning for bounding box regression,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12993–13000, 2020.
[12] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” in Proceedings of the International Conference on Learning Representations, 2020.
[13] H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
[14] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba, “Lookahead optimizer: K steps forward, 1 step back,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, eds.), vol. 32, (Red Hook, NY, USA), Curran Associates Inc., 2019.
[15] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Master’s thesis, Department of Computer Science, University of Toronto, 2009.
[16] L. Yao, “An effective vehicle counting approach based on cnn,” in Proceedings of the IEEE 2nd International Conference on Electronics and Communication Engineering, pp. 15–19, 2019.
[17] “道路交通安全規則.” https://law.moj.gov.tw/LawClass/LawAll.aspx? pcode=K0040013, accessed 2021­07­05, 2021.
[18] “機動車輛登記數.” https://www.motc.gov.tw/uploaddowndoc?file=month/ 23050.pdf&filedisplay=23050.pdf&flag=doc, accessed 2021­08­01, 2021.
[19] J. Zheng, Y. Wang, and W. Zeng, “CNN based vehicle counting with virtual coil in traffic surveillance video,” in Proceedings of the IEEE International Conference on Multimedia Big Data, pp. 280–281, 2015.
[20] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893, Ieee, 2005.
[21] C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,” in Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 555–562, IEEE, 1998.
[22] B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial intelligence, vol. 17, no. 1­3, pp. 185–203, 1981.
[23] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel features for multi­view face detection,” in ProceedingsoftheIEEEInternationalJointConferenceonBiometrics, pp. 1–8, IEEE, 2014.
[24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.
[25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real­time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
[26] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[27] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271, 2017.
[28] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856, 2018.
[29] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
[30] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323, JMLR Workshop and Conference Proceedings, 2011.
[31] P.­T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross­entropy method,” The Annals of Operations Research, vol. 134, no. 1, pp. 19– 67, 2005.
[32] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666, 2019.
[33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[34] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in Proceedings of the IEEE International Conference on Image Processing, pp. 3464–3468, IEEE, 2016.
[35] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960.
[36] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1­2, pp. 83–97, 1955.
[37] A. B. Jung, “imgaug.” https://github.com/aleju/imgaug, accessed 2021­07­
09, 2018.
[38] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the International Conference on Machine Learning, vol. 30, p. 3, Citeseer, 2013.

無法下載圖示 全文公開日期 2031/08/28 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE