基於輕量化語意分割且可動態卸載之廣域監控｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	賴冠廷 Kuan-Ting Lai
論文名稱：	基於輕量化語意分割且可動態卸載之廣域監控 Lightweight Semantic Segmentation for Wide Area Monitoring with Dynamic Offloading Design
指導教授：	陸敬互 Ching-Hu Lu
口試委員:	黃正民 Cheng-Ming Huang 花凱龍 Kai-Lung Hua 蘇順豐 Shun-Feng Su 鍾聖倫 Sheng-Luen Chung 陸敬互 Ching-Hu Lu
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	82
中文關鍵詞：	語意分割、輕量化神經網路、混合式架構、計算卸載、物聯網
外文關鍵詞：	Semantic Segmentation, Lightweight Neural Network, Hybrid Architecture, Computation Offload, Internet of Things
相關次數：	點閱：266 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著物聯網邊緣計算概念的興起，大量具有邊緣運算能力之攝影機 (本研究稱之為邊緣攝影機) 已逐漸應用於廣域監控之串流影像即時處理。在廣域監控中，基於邊界框之多目標追蹤性能逐漸達到飽和，所以透過像素層級的影像分割正受到青睞。但既有邊緣攝影機的硬體性能仍十分有限，難以順暢運行複雜的影像分割網路。此外，廣域監控之邊緣攝影機所偵測之人流可能突增造成多目標追蹤模型之精度下降，甚至延遲服務的響應速度。因此，針對上述影像分割在邊緣攝影機之效能瓶頸問題，本研究提出了「輕量化語意分割網路」，其使用高效的分離卷積運算為基礎來盡可能提升運算速度，並以密集空洞空間卷積模塊為核心，且透過注意力及邊緣銳化等機制來強化影效分割效果。接下來，針對上述邊緣攝影機受人流突增之系統響應問題，本研究提出「信心導向之動態邊緣卸載技術」，建構出一結合邊緣及雲端運算之混合式系統，經由邊緣攝影機監控自身多目標追蹤之信心度，在人流驟增導致偵測信心度不足，邊緣攝影機可動態將多目標追蹤任務向上卸載於更高層之伺服器，但邊緣攝影機仍可繼續執行目標偵測及目標分割。當人潮偵測負載降低時，系統會將重新將追蹤任務向下卸載給原來的邊緣攝影機，發揮邊緣計算應有的效益。在實驗驗證上，本研究的「輕量化語意分割網路」能在維持執行速度接近既有研究的水準下，在分類精度 (class-wise accuracy) 額外提升了4%，以及提升了1%之種類分類 (category-wise accuracy)。另外，在「信心導向之動態邊緣卸載技術」的驗證上，本研究採模擬人流驟增之串流資料集進行測試。結果顯示，除非信心度不足，否則所有追蹤任務可以完全在邊緣攝影機上獨立完成。因此，相比於傳統單純使用雲端運算的設計上，本研究在測試資料集上，整體串流之通訊傳輸可節省6.5MB，而只僅損失0.6%之精度。同時，在啟動向下卸載之兩段串流測試影像上，因為透過卸載可濾除人流驟增所造成之低信心結果，所以可額外提升2%及8%之精度。因此，本研究所提出的混合式系統除了達到節省通訊成本，也能維持動態環境中邊緣運算之品質穩定度。

With the rise of the edge computing for the Internet of Things (IoT), smart cameras leveraging edge intelligence (hereafter referred to edge camera) have gradually been applied to real-time streaming-image analytics for wide-area surveillance. In wide-area surveillance, because the performance of multiple object tracking (MOT) based on bounding boxes is gradually saturated, pixel-based image segmentation now becomes more favored. However, existing edge cameras often have restricted computing power and cannot efficiently run complex neural networks for image segmentation. In addition, the number of people detected by the edge camera may suddenly increase, causing deterioration in the accuracy of the MOT model and the responsiveness of the associated service. To address the efficiency bottleneck of image segmentation on an edge camera, we design a lightweight semantic segmentation network. The proposed network incorporates high-efficient convolution operations, the dilated convolution, attention mechanism, and borderline sharpening to improve efficiency and reduce computational cost as well as model size. Next, to address the problem of the decrease system responsiveness due to increasing tracking targets, we propose a hybrid system based on confidence-oriented dynamic computing offloading among edges and cloud servers. An edge camera in the hybrid system can monitor confidence during MOT. When the confidence decreases due to the sudden increase in the number of targets, the edge camera can dynamically offload MOT tasks to servers. The edge camera can undertake target detection and target segmentation. When the confidence increases due to decreasing loading, the hybrid system can migrate the MOT task back to the original edge camera to better leverage computing. The experiment results, tested on the Cityscapes dataset, show that the accuracy of the lightweight semantic segmentation network can improve the class-wise accuracy by 4% and category-wise accuracy by 1% given our proposed network can run as almost fast as prior studies. In the verification of “confidence-oriented dynamic edge offloading technology,” our study used a streaming dataset that simulates a sudden increase in targets. The results show that all tracking tasks can be directly executed on an edge camera unless its tracking confidence went below a predefined threshold. Therefore, compared with the traditional design using only cloud computing, the hybrid system can save 6.5MB in data transmission on the testing dataset at the cost of only 0.6% loss in accuracy. On the other hand, the accuracy of the hybrid system can be increased by 2% and 8% since it can filter out the low confidence results caused by the sudden increase in targets. Therefore, the proposed hybrid system can not only save communication costs, but it also maintains the quality stability of edge computing in a dynamic environment.

中文摘要    I
Abstract    II
致謝    IV
目錄    V
圖目錄    VII
表格目錄    IX
第一章 簡介    1
1.1 研究動機    1
1.2 文獻探討    3
1.2.1 邊緣裝置運行深度語意分割模型    3
1.2.2 邊緣裝置因應人流變遷的系統響應    5
1.3 本研究貢獻與文章架構    8
第二章 系統設計理念與架構簡介    10
第三章 輕量影像分割網路    12
3.1    深度分離卷積運算    12
3.2    倒置殘差模塊    15
3.3    空洞卷積    17
3.4    空洞卷積之級聯    19
3.5    密集空洞空間卷機池化金字塔    19
3.6    通道注意力機制    21
3.7    影像銳化處理    23
3.8    輕量影像分割網路之結構設計    25
3.9    輕量影像分割網路之訓練    28
第四章 信心導向之動態邊緣卸載技術    30
中文摘要    I
Abstract    II
致謝    IV
目錄    V
圖目錄    VII
表格目錄    IX
第一章 簡介    1
1.1 研究動機    1
1.2 文獻探討    3
1.2.1 邊緣裝置運行深度語意分割模型    3
1.2.2 邊緣裝置因應人流變遷的系統響應    5
1.3 本研究貢獻與文章架構    8
第二章 系統設計理念與架構簡介    10
第三章 輕量影像分割網路    12
3.1    深度分離卷積運算    12
3.2    倒置殘差模塊    15
3.3    空洞卷積    17
3.4    空洞卷積之級聯    19
3.5    密集空洞空間卷機池化金字塔    19
3.6    通道注意力機制    21
3.7    影像銳化處理    23
3.8    輕量影像分割網路之結構設計    25
3.9    輕量影像分割網路之訓練    28
第四章 信心導向之動態邊緣卸載技術    30
4.1    邊緣卸載模組    31
4.2    雲/霧端卸載模組    33
第五章 實驗結果與討論    35
5.1    實驗平台    35
5.2    輕量化語意分割網路    36
5.2.1 實驗資料集及語意分割評估指標    36
5.2.2 語意分割網路之訓練階段    37
5.2.3 網路模塊之細節分析    39
5.2.4 相關研究之比較    41
5.3    信心導向之動態邊緣卸載技術    42
5.3.1 實驗資料集及多目標追蹤評估指標    43
5.3.2 情境一之計算卸載應用示範    45
5.3.3 情境二之計算卸載應用示範    46
5.3.4 情境三之計算卸載應用示範    51
5.3.5 實驗討論    58
第六章 結論與未來研究方向    61
參考文獻    63
口試委員之建議與回覆    67

                                

[1] P. Mendki, "Docker container based analytics at IoT edge Video analytics usecase," in 2018 3rd International Conference On Internet of Things: Smart Innovation and Usages (IoT-SIU), 2018, pp. 1-4.
[2] D. Reinsel, J. Gantz, and J. Rydning, "Data Age 2025: The Evolution of Data to Life-Critical Don’t Focus on Big Data; Focus on Data That’s Big," IDC, Seagate, April, 2017.
[3] A. Chowdhery, M. Levorato, I. Burago, and S. Baidya, "Urban IoT Edge Analytics," in Fog Computing in the Internet of Things: Springer, 2018, pp. 101-120.
[4] H. Tanaka, M. Yoshida, K. Mori, and N. Takahashi, "Multi-access Edge Computing: A Survey," Journal of Information Processing, vol. 26, pp. 87-97, 2018.
[5] G. Ananthanarayanan et al., "Real-time video analytics: The killer app for edge computing," computer, vol. 50, no. 10, pp. 58-67, 2017.
[6] S. Pouyanfar et al., "A survey on deep learning: Algorithms, techniques, and applications," vol. 51, no. 5, p. 92, 2019.
[7] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[8] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015, pp. 234-241: Springer.
[9] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[10] A. Paszke, A. Chaurasia, S. Kim, and E. J. a. p. a. Culurciello, "Enet: A deep neural network architecture for real-time semantic segmentation," 2016.
[11] E. Romera, J. M. Alvarez, L. M. Bergasa, and R. J. I. T. o. I. T. S. Arroyo, "Erfnet: Efficient residual factorized convnet for real-time semantic segmentation," vol. 19, no. 1, pp. 263-272, 2017.
[12] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 552-568.
[13] Y. Kang et al., "Neurosurgeon: Collaborative intelligence between the cloud and mobile edge," in ACM SIGARCH Computer Architecture News, 2017, vol. 45, no. 1, pp. 615-629: ACM.
[14] X. Tao, K. Ota, M. Dong, H. Qi, and K. J. I. W. C. L. Li, "Performance guaranteed computation offloading for mobile-edge cloud computing," vol. 6, no. 6, pp. 774-777, 2017.
[15] T. Y.-H. Chen, L. Ravindranath, S. Deng, P. Bahl, and H. Balakrishnan, "Glimpse: Continuous, real-time object recognition on mobile devices," in Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, 2015, pp. 155-168: ACM.
[16] X. Qi, C. Liu, and S. Schuckers, "IoT edge device based key frame extraction for face in video recognition," in 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2018, pp. 641-644: IEEE.
[17] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[18] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[19] K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," in European conference on computer vision, 2016, pp. 630-645: Springer.
[20] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
[21] A. Krizhevsky and G. Hinton, "Convolutional deep belief networks on cifar-10," Unpublished manuscript, vol. 40, no. 7, 2010.
[22] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.
[23] P. Wang et al., "Understanding convolution for semantic segmentation," in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 1451-1460: IEEE.
[24] G. Lin, A. Milan, C. Shen, and I. Reid, "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1925-1934.
[25] L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, "Attention to scale: Scale-aware semantic image segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3640-3649.
[26] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017.
[27] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[28] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, "Denseaspp for semantic segmentation in street scenes," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3684-3692.
[29] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[30] L. Itti and C. Koch, "Computational modelling of visual attention," Nature reviews neuroscience, vol. 2, no. 3, p. 194, 2001.
[31] A. M. Treisman, "Selective attention in man," British medical bulletin, 1964.
[32] W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. J. I. t. o. p. a. Yang, and m. intelligence, "Fast and accurate image super-resolution with deep laplacian pyramid networks," 2018.
[33] T. Pohlen, A. Hermans, M. Mathias, and B. Leibe, "Full-resolution residual networks for semantic segmentation in street scenes," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4151-4160.
[34] M. Cordts et al., "The cityscapes dataset for semantic urban scene understanding," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213-3223.
[35] D. P. Kingma and J. J. a. p. a. Ba, "Adam: A method for stochastic optimization," 2014.
[36] M. Siam, M. Gamal, M. Abdel-Razek, S. Yogamani, and M. Jagersand, "Rtseg: Real-time semantic segmentation comparative study," in 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 1603-1607: IEEE.
[37] M. Treml et al., "Speeding up semantic segmentation for autonomous driving," in MLITS, NIPS Workshop, 2016, vol. 2, p. 7.
[38] V. Badrinarayanan, A. Kendall, R. J. I. t. o. p. a. Cipolla, and m. intelligence, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," vol. 39, no. 12, pp. 2481-2495, 2017.
[39] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 3464-3468: IEEE.
[40] N. Wojke, A. Bewley, and D. Paulus, "Simple online and realtime tracking with a deep association metric," in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 3645-3649: IEEE.
[41] L. Patino, T. Cane, A. Vallee, and J. Ferryman, "Pets 2016: Dataset and challenge," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 1-8.
[42] M. Andriluka, S. Roth, and B. Schiele, "Monocular 3d pose estimation and tracking by detection," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 623-630: IEEE.
[43] J. Ferryman and A. Shahrokni, "Pets2009: Dataset and challenge," in 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, 2009, pp. 1-6: IEEE.
[44] A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. J. a. p. a. Schindler, "MOT16: A benchmark for multi-object tracking," 2016.

全文公開日期 2025/01/16 (校內網路)
全文公開日期 2025/01/16 (校外網路)
全文公開日期 2025/01/16 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文