研究生: |
劉維軒 Wei-Hsuan Liu |
---|---|
論文名稱: |
基於深度學習之實時多類別車流計數系統 Real Time Multi-Class Vehicle Counting System based on Deep Learning |
指導教授: |
戴文凱
Wen-Kai Tai |
口試委員: |
洪西進
Shi-Jinn Horng 賈仲雍 Chung-Yung Jar |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 101 |
中文關鍵詞: | 深度學習 、車輛辨識 、車輛追蹤 、車流計數 |
外文關鍵詞: | deep learning, vehicle classification, vehicle tracking, vehicle counting |
相關次數: | 點閱:272 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在傳統中,車流的計算是以人工計數方式,對經過的每一台車判斷其類別並計數。對於車流的掌握狀況,往往需等待人工計數完畢後,才能對過去該時段的車流狀況進行分析。在日益漸增的車輛數目與運輸需求下,面對龐大的交通數據,勢必消耗更大量的人力、時間與成本對車流進行計數,且無法實時反應當下的車流狀況,立即做出對應決策。我們希望能透過自動化技術,實時對該地區的道路畫面自動進行車流計數,取代傳統的人工計數方式。
本論文提出一套實時多類別車流計數系統,透過四個階段實時對道路畫面中的車輛進行分類並計數。首先,以高效的方式讀取影片,依序輸出影片的連續影像(幀)。然後,基於深度學習方法建構一個輕量的車輛辨識模型,對每一幀中的車輛進行辨識與分類。接著,根據車輛辨識模型的預測結果進行車輛追蹤,判斷連續幀中的車輛是否為同一台車,並提出三種基於預測信心值進行類別判斷的策略,判斷追蹤車輛的類別。最後,創建感興趣區域遮罩進行車流計數,當車輛經過感興趣區域遮罩時,判斷該車的計數時機決定是否計數,並透過車輛座標校正判斷車輛的所在車道。
我們建立了一個包含 99,076 台車的車輛辨識資料集,對車輛辨識模型進行訓練與測試。透過 11 部影格率為 30 FPS 的各種場景之車流影片,建立一個車流計數資料集,對本論文提出的系統進行計數測試。在車輛辨識資料集中,我們的車輛辨識模型擁有 94.1144% 準確度。在車流計數資料集中,透過信心值類別判斷策略進行車種分類擁有 96.3782% 準確度、感興趣區域遮罩於各場景皆擁有 0% 計數重複率,以及整體 94.1289% 絕對準確率、在使用 CPU 為 Ryzen 9 3950X 與 GPU 為 RTX 2080 Ti 的硬體規格下,系統擁有約 47-53 FPS 的平均運行速度。
In the traditional way, the calculation of traffic flow is to manually count the number of cars passed in each vehicle class, it consumes a lot of manpower, time and cost making impossible to reflect the current vehicle flow in real time and decisionmaking immediately.
In this thesis, we propose a real time multiclass vehicle counting system, which classifies and counts the passing vehicles in real time through four stages. First, the video inputs in an efficient way, and output the continuous images (frames) of the video in sequence. Then, a lightweight vehicle classification model is constructed based on the deep learning method to detect and classify vehicles in each frame. Next, we perform vehicle tracking based on the prediction result of the vehicle classification model to determine whether the vehicles in consecutive frames are the same. We also propose three strategies for classification based on the predicted confidence value to classify the tracking vehicle. Finally, create a region of interest mask to count the vehicle flow. When a vehicle passes through the region of interest mask, the timing of the vehicle determines whether to count, and determines the lane of the vehicle through vehicle coordinate correction.
We have established a vehicle classification dataset containing 99,076 vehicles to train and test the vehicle classification model. Through 11 vehicle flow videos of various scenes with a frame rate of 30 FPS, a vehicle flow counting dataset was established to test the proposed system. In the vehicle classification dataset, our vehicle classification model has an accuracy of 94.1144%. In the vehicle flow counting dataset, the classification of vehicle class through the confidence value classification strategy has an accuracy of 96.3782%, the region of interest mask has a count repetition rate of 0% in each scene, and reach an overall absolute accuracy rate of 94.1289%. Furthermore, our system performs efficiently for 4753 FPS on average.
[1] A. Chetouane, S. Mabrouk, I. Jemili, and M. Mosbah, “Visionbased vehicle detection for road traffic congestion classification,” Concurrency and Computation: Practice and Experience, p. e5983, 2020.
[2] T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017.
[3] A. Kathuria, “What’s new in yolo v3?,” Apr 2018. https:// towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b, accessed 20210609.
[4] J. Redmon, “Darknet: Open source neural networks in c.” http://pjreddie.com/ darknet/, accessed 20210609, 2013–2016.
[5] N. Ma, X. Zhang, H.T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient CNN architecture design,” in Proceedings of the European Conference on Computer Vision, pp. 116–131, 2018.
[6] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258, 2017.
[7] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018.
[8] J. Hu, L. Shen, and G. Sun, “Squeezeandexcitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2018.
[9] D. Misra, “Mish: A self regularized nonmonotonic activation function,” arXiv preprint arXiv:1908.08681, 2019.
[10] T.Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988, 2017.
[11] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distanceiou loss: Faster and better learning for bounding box regression,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12993–13000, 2020.
[12] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” in Proceedings of the International Conference on Learning Representations, 2020.
[13] H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
[14] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba, “Lookahead optimizer: K steps forward, 1 step back,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, eds.), vol. 32, (Red Hook, NY, USA), Curran Associates Inc., 2019.
[15] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Master’s thesis, Department of Computer Science, University of Toronto, 2009.
[16] L. Yao, “An effective vehicle counting approach based on cnn,” in Proceedings of the IEEE 2nd International Conference on Electronics and Communication Engineering, pp. 15–19, 2019.
[17] “道路交通安全規則.” https://law.moj.gov.tw/LawClass/LawAll.aspx? pcode=K0040013, accessed 20210705, 2021.
[18] “機動車輛登記數.” https://www.motc.gov.tw/uploaddowndoc?file=month/ 23050.pdf&filedisplay=23050.pdf&flag=doc, accessed 20210801, 2021.
[19] J. Zheng, Y. Wang, and W. Zeng, “CNN based vehicle counting with virtual coil in traffic surveillance video,” in Proceedings of the IEEE International Conference on Multimedia Big Data, pp. 280–281, 2015.
[20] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893, Ieee, 2005.
[21] C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,” in Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 555–562, IEEE, 1998.
[22] B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial intelligence, vol. 17, no. 13, pp. 185–203, 1981.
[23] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel features for multiview face detection,” in ProceedingsoftheIEEEInternationalJointConferenceonBiometrics, pp. 1–8, IEEE, 2014.
[24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.
[25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
[26] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[27] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271, 2017.
[28] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856, 2018.
[29] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
[30] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323, JMLR Workshop and Conference Proceedings, 2011.
[31] P.T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the crossentropy method,” The Annals of Operations Research, vol. 134, no. 1, pp. 19– 67, 2005.
[32] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666, 2019.
[33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[34] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in Proceedings of the IEEE International Conference on Image Processing, pp. 3464–3468, IEEE, 2016.
[35] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960.
[36] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 12, pp. 83–97, 1955.
[37] A. B. Jung, “imgaug.” https://github.com/aleju/imgaug, accessed 202107
09, 2018.
[38] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the International Conference on Machine Learning, vol. 30, p. 3, Citeseer, 2013.