研究生: |
賴冠廷 Kuan-Ting Lai |
---|---|
論文名稱: |
基於輕量化語意分割且可動態卸載之廣域監控 Lightweight Semantic Segmentation for Wide Area Monitoring with Dynamic Offloading Design |
指導教授: |
陸敬互
Ching-Hu Lu |
口試委員: |
黃正民
Cheng-Ming Huang 花凱龍 Kai-Lung Hua 蘇順豐 Shun-Feng Su 鍾聖倫 Sheng-Luen Chung 陸敬互 Ching-Hu Lu |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 82 |
中文關鍵詞: | 語意分割 、輕量化神經網路 、混合式架構 、計算卸載 、物聯網 |
外文關鍵詞: | Semantic Segmentation, Lightweight Neural Network, Hybrid Architecture, Computation Offload, Internet of Things |
相關次數: | 點閱:302 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著物聯網邊緣計算概念的興起,大量具有邊緣運算能力之攝影機 (本研究稱之為邊緣攝影機) 已逐漸應用於廣域監控之串流影像即時處理。在廣域監控中,基於邊界框之多目標追蹤性能逐漸達到飽和,所以透過像素層級的影像分割正受到青睞。但既有邊緣攝影機的硬體性能仍十分有限,難以順暢運行複雜的影像分割網路。此外,廣域監控之邊緣攝影機所偵測之人流可能突增造成多目標追蹤模型之精度下降,甚至延遲服務的響應速度。因此,針對上述影像分割在邊緣攝影機之效能瓶頸問題,本研究提出了「輕量化語意分割網路」,其使用高效的分離卷積運算為基礎來盡可能提升運算速度,並以密集空洞空間卷積模塊為核心,且透過注意力及邊緣銳化等機制來強化影效分割效果。接下來,針對上述邊緣攝影機受人流突增之系統響應問題,本研究提出「信心導向之動態邊緣卸載技術」,建構出一結合邊緣及雲端運算之混合式系統,經由邊緣攝影機監控自身多目標追蹤之信心度,在人流驟增導致偵測信心度不足,邊緣攝影機可動態將多目標追蹤任務向上卸載於更高層之伺服器,但邊緣攝影機仍可繼續執行目標偵測及目標分割。當人潮偵測負載降低時,系統會將重新將追蹤任務向下卸載給原來的邊緣攝影機,發揮邊緣計算應有的效益。在實驗驗證上,本研究的「輕量化語意分割網路」能在維持執行速度接近既有研究的水準下,在分類精度 (class-wise accuracy) 額外提升了4%,以及提升了1%之種類分類 (category-wise accuracy)。另外,在「信心導向之動態邊緣卸載技術」的驗證上,本研究採模擬人流驟增之串流資料集進行測試。結果顯示,除非信心度不足,否則所有追蹤任務可以完全在邊緣攝影機上獨立完成。因此,相比於傳統單純使用雲端運算的設計上,本研究在測試資料集上,整體串流之通訊傳輸可節省6.5MB,而只僅損失0.6%之精度。同時,在啟動向下卸載之兩段串流測試影像上,因為透過卸載可濾除人流驟增所造成之低信心結果,所以可額外提升2%及8%之精度。因此,本研究所提出的混合式系統除了達到節省通訊成本,也能維持動態環境中邊緣運算之品質穩定度。
With the rise of the edge computing for the Internet of Things (IoT), smart cameras leveraging edge intelligence (hereafter referred to edge camera) have gradually been applied to real-time streaming-image analytics for wide-area surveillance. In wide-area surveillance, because the performance of multiple object tracking (MOT) based on bounding boxes is gradually saturated, pixel-based image segmentation now becomes more favored. However, existing edge cameras often have restricted computing power and cannot efficiently run complex neural networks for image segmentation. In addition, the number of people detected by the edge camera may suddenly increase, causing deterioration in the accuracy of the MOT model and the responsiveness of the associated service. To address the efficiency bottleneck of image segmentation on an edge camera, we design a lightweight semantic segmentation network. The proposed network incorporates high-efficient convolution operations, the dilated convolution, attention mechanism, and borderline sharpening to improve efficiency and reduce computational cost as well as model size. Next, to address the problem of the decrease system responsiveness due to increasing tracking targets, we propose a hybrid system based on confidence-oriented dynamic computing offloading among edges and cloud servers. An edge camera in the hybrid system can monitor confidence during MOT. When the confidence decreases due to the sudden increase in the number of targets, the edge camera can dynamically offload MOT tasks to servers. The edge camera can undertake target detection and target segmentation. When the confidence increases due to decreasing loading, the hybrid system can migrate the MOT task back to the original edge camera to better leverage computing. The experiment results, tested on the Cityscapes dataset, show that the accuracy of the lightweight semantic segmentation network can improve the class-wise accuracy by 4% and category-wise accuracy by 1% given our proposed network can run as almost fast as prior studies. In the verification of “confidence-oriented dynamic edge offloading technology,” our study used a streaming dataset that simulates a sudden increase in targets. The results show that all tracking tasks can be directly executed on an edge camera unless its tracking confidence went below a predefined threshold. Therefore, compared with the traditional design using only cloud computing, the hybrid system can save 6.5MB in data transmission on the testing dataset at the cost of only 0.6% loss in accuracy. On the other hand, the accuracy of the hybrid system can be increased by 2% and 8% since it can filter out the low confidence results caused by the sudden increase in targets. Therefore, the proposed hybrid system can not only save communication costs, but it also maintains the quality stability of edge computing in a dynamic environment.
[1] P. Mendki, "Docker container based analytics at IoT edge Video analytics usecase," in 2018 3rd International Conference On Internet of Things: Smart Innovation and Usages (IoT-SIU), 2018, pp. 1-4.
[2] D. Reinsel, J. Gantz, and J. Rydning, "Data Age 2025: The Evolution of Data to Life-Critical Don’t Focus on Big Data; Focus on Data That’s Big," IDC, Seagate, April, 2017.
[3] A. Chowdhery, M. Levorato, I. Burago, and S. Baidya, "Urban IoT Edge Analytics," in Fog Computing in the Internet of Things: Springer, 2018, pp. 101-120.
[4] H. Tanaka, M. Yoshida, K. Mori, and N. Takahashi, "Multi-access Edge Computing: A Survey," Journal of Information Processing, vol. 26, pp. 87-97, 2018.
[5] G. Ananthanarayanan et al., "Real-time video analytics: The killer app for edge computing," computer, vol. 50, no. 10, pp. 58-67, 2017.
[6] S. Pouyanfar et al., "A survey on deep learning: Algorithms, techniques, and applications," vol. 51, no. 5, p. 92, 2019.
[7] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[8] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015, pp. 234-241: Springer.
[9] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[10] A. Paszke, A. Chaurasia, S. Kim, and E. J. a. p. a. Culurciello, "Enet: A deep neural network architecture for real-time semantic segmentation," 2016.
[11] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. J. I. T. o. I. T. S. Arroyo, "Erfnet: Efficient residual factorized convnet for real-time semantic segmentation," vol. 19, no. 1, pp. 263-272, 2017.
[12] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 552-568.
[13] Y. Kang et al., "Neurosurgeon: Collaborative intelligence between the cloud and mobile edge," in ACM SIGARCH Computer Architecture News, 2017, vol. 45, no. 1, pp. 615-629: ACM.
[14] X. Tao, K. Ota, M. Dong, H. Qi, and K. J. I. W. C. L. Li, "Performance guaranteed computation offloading for mobile-edge cloud computing," vol. 6, no. 6, pp. 774-777, 2017.
[15] T. Y.-H. Chen, L. Ravindranath, S. Deng, P. Bahl, and H. Balakrishnan, "Glimpse: Continuous, real-time object recognition on mobile devices," in Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, 2015, pp. 155-168: ACM.
[16] X. Qi, C. Liu, and S. Schuckers, "IoT edge device based key frame extraction for face in video recognition," in 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2018, pp. 641-644: IEEE.
[17] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[18] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[19] K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," in European conference on computer vision, 2016, pp. 630-645: Springer.
[20] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
[21] A. Krizhevsky and G. Hinton, "Convolutional deep belief networks on cifar-10," Unpublished manuscript, vol. 40, no. 7, 2010.
[22] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.
[23] P. Wang et al., "Understanding convolution for semantic segmentation," in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 1451-1460: IEEE.
[24] G. Lin, A. Milan, C. Shen, and I. Reid, "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1925-1934.
[25] L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, "Attention to scale: Scale-aware semantic image segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3640-3649.
[26] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017.
[27] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[28] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, "Denseaspp for semantic segmentation in street scenes," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3684-3692.
[29] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[30] L. Itti and C. Koch, "Computational modelling of visual attention," Nature reviews neuroscience, vol. 2, no. 3, p. 194, 2001.
[31] A. M. Treisman, "Selective attention in man," British medical bulletin, 1964.
[32] W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. J. I. t. o. p. a. Yang, and m. intelligence, "Fast and accurate image super-resolution with deep laplacian pyramid networks," 2018.
[33] T. Pohlen, A. Hermans, M. Mathias, and B. Leibe, "Full-resolution residual networks for semantic segmentation in street scenes," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4151-4160.
[34] M. Cordts et al., "The cityscapes dataset for semantic urban scene understanding," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213-3223.
[35] D. P. Kingma and J. J. a. p. a. Ba, "Adam: A method for stochastic optimization," 2014.
[36] M. Siam, M. Gamal, M. Abdel-Razek, S. Yogamani, and M. Jagersand, "Rtseg: Real-time semantic segmentation comparative study," in 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 1603-1607: IEEE.
[37] M. Treml et al., "Speeding up semantic segmentation for autonomous driving," in MLITS, NIPS Workshop, 2016, vol. 2, p. 7.
[38] V. Badrinarayanan, A. Kendall, R. J. I. t. o. p. a. Cipolla, and m. intelligence, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," vol. 39, no. 12, pp. 2481-2495, 2017.
[39] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 3464-3468: IEEE.
[40] N. Wojke, A. Bewley, and D. Paulus, "Simple online and realtime tracking with a deep association metric," in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 3645-3649: IEEE.
[41] L. Patino, T. Cane, A. Vallee, and J. Ferryman, "Pets 2016: Dataset and challenge," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 1-8.
[42] M. Andriluka, S. Roth, and B. Schiele, "Monocular 3d pose estimation and tracking by detection," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 623-630: IEEE.
[43] J. Ferryman and A. Shahrokni, "Pets2009: Dataset and challenge," in 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, 2009, pp. 1-6: IEEE.
[44] A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. J. a. p. a. Schindler, "MOT16: A benchmark for multi-object tracking," 2016.