簡易檢索 / 詳目顯示

研究生: 吳柏逸
Bo-Yi Wu
論文名稱: 輕量化頻域深度網路
Lightweight Frequency-Domain Deep Network
指導教授: 陸敬互
Ching-Hu Lu
口試委員: 陸敬互
Ching-Hu Lu
郭重顯
Chung-Hsien Kuo
鍾聖倫
Sheng-Luen Chung
蘇順豐
Shun-Feng Su
廖峻鋒
Chun-Feng Liao
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 104
中文關鍵詞: 輕量化神經網路頻域邊緣計算物聯網計算域轉換影像辨識
外文關鍵詞: Lightweight Deep Network, Frequency Domain, Edge Computing, IoT, Computation Domain Transform, Image Recognition
相關次數: 點閱:232下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

較費時的影像處理常常嚴重影響到電腦視覺服務的即時性。近幾年隨著人工智慧物聯網 (AIoT) 的快速發展,結合邊緣計算與AIoT的攝影機 (以下簡稱邊緣攝影機) 已在物件偵測、物件辨識等應用領域獲得成功。雖然近年來已有研究透過深度神經網路進行即時影像辨識,但由於既有研究模型計算過於複雜,無法在資源有限的邊緣攝影機上即時完成任務。所以目前已經有頻域結合時域的深度網路模型來改善上述問題,但目前尚未有純頻域的輕量深度網路。因此,本研究提出「輕量化純頻域深度網路模型」,能夠於邊緣攝影機完成即時影像辨識運算並節省能源。本研究首先優化既有研究「頻域卷積層」以改善提取特徵時產生的偏差,並分別優化「頻域激勵函式」來改善時域激勵函式對頻譜數值造成的準確度下降、「頻域池化層」來改善池化時造成的特徵遺失、「頻域Dropout」來改善不同影像產生數值浮動的問題、「計算域轉換方法」來改善因輸入影像尺寸增大造成模型計算次數度隨之增加的問題。更重要的是,本研究首度提出「頻域全連結層」,其更能表達頻譜資料的特徵分佈關係。實驗結果顯示,進行MNIST與CIFAR-10資料集推論時分別有最高99.74%與99.82%準確度,略勝既有頻域的結果。本研究與既有研究相比,在邊緣攝影機 (採NVIDIA Jetson TX2) 上能分別提升52.01%與52.00%每秒幀率。由以上可知,本研究於即時影像處理與辨識準確度皆擁有絕對優勢。在資料空間處理上,進行上述兩種資料集推論時分別節省最多43.64%與40.00%記憶體使用量。在能源消耗方面,進行上述兩種資料集推論時能分別節省最多26.09%與25.56%每秒電力消耗。綜合上述可說明本研究提出的輕量化純頻域深度網路與過去研究相比更能獲得更佳的辨識度與效能表現。


Smart cameras leveraging edge computing (hereinafter referred to as edge cameras) have been successful in object detection and object recognition. Although there have been research on real-time image recognition through deep neural networks in recent years, their models are too complex to execute in real time on edge cameras with limited resources. Therefore, there are already frequency domain combined with time domain (or hybrid domain) deep network models to improve the above problem, but there is no pure frequency domain lightweight deep network yet. Thus, our study proposes "Lightweight Pure Frequency-Domain Deep Network" that can perform real-time image recognition on edge cameras and save energy. Comparing to the existing time-domain or hybrid domain research, this study optimizes frequency-domain convolutional layer to avoid the bias generated when extracting features, frequency-domain activation function to improve the accuracy, frequency-domain pooling layer to reduce feature loss, frequency-domain dropout to improve the problem of numerical fluctuations in different images, and computational domain transform to reduce the number of calculation increases due to increasing the input image size. More importantly, our study is the first to propose frequency-domain fully connected layer, which can better represent a spectral feature distribution. The experimental results show that the highest accuracy of 99.74% and 99.82% can be achieved when tested using the MNIST and CIFAR-10 datasets, slightly outperform those in the time or hybrid domain. Compared with the existing research, this study can improve the frame per second by 52.01% and 52.00% respectively on the edge camera (on the NVIDIA Jetson TX2). This study saves up to 43.64% and 40.00% memory usage and 26.09% and 25.56% power consumption when tested using the above two datasets respectively.

中文摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VI 表格目錄 VIII 第一章 簡介 1 1.1 研究動機 1 1.2 文獻探討 6 1.2.1 「缺乏完整純頻域深度網路模型」的議題 6  時域型深度網路模型 6  混合型深度網路模型 7  頻域型深度網路模型 11 1.3 本研究貢獻和文章架構 12 第二章 系統設計理念與架構簡介 14 第三章 頻域深度學習模型 17 3.1 輕量化離散餘弦轉換 17 3.2 頻域卷積計算 21 3.3 頻域激勵函式 24 3.4 頻域深度卷積層 26 3.5 頻域池化層 28 3.6 頻域SE-Block 30 3.7 頻域全連結層 32 3.8 頻域Dropout 34 3.9 頻域深度網路架構 35 第四章 實驗結果與討論 36 4.1 實驗平台 36 4.2 實驗資料集和客觀評估指標 36 4.3 網路訓練參數與流程介紹 39 4.4 頻域深度網路 41 4.4.1 輕量化頻域深度網路 41 4.4.2 抽換實驗 50 4.4.3 Dropout Rate設定實驗 51 4.4.4 全連結層神經元數量設定實驗 52 4.4.5 Self-Training with Noisy Dataset實驗 53 4.4.6 輕量化離散餘弦轉換模型深度-參數量比較 57 4.4.7 實驗討論 59 4.5 相關研究比較 66 4.5.1 頻域深度網路辨識準確度相關研究比較 66 4.5.2 頻域深度網路推論速度相關研究比較 69 4.5.3 各計算層推論時間占用比例 76 4.5.4 頻域卷積層bias移除比較 78 第五章 結論與未來研究方向 79 參考文獻 80 發表著作與作品列表 84 口試委員之建議與回覆 85

[1] S. Lucero, "IoT platforms: enabling the Internet of Things.,"
[2] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. J. I. i. o. t. j. Xu, "Edge computing: Vision and challenges," vol. 3, no. 5, pp. 637-646, 2016.
[3] J. Chen and X. J. P. o. t. I. Ran, "Deep Learning With Edge Computing: A Review," vol. 107, no. 8, pp. 1655-1674, 2019.
[4] J. Lim, J. Seo, and Y. Baek, "CamThings: IoT camera with energy-efficient communication by edge computing based on Deep Learning," in 2018 28th International Telecommunication Networks and Applications Conference (ITNAC), 2018, pp. 1-6: IEEE.
[5] J. Ren, Y. Guo, D. Zhang, Q. Liu, and Y. J. I. N. Zhang, "Distributed and efficient object detection in edge computing: Challenges and solutions," vol. 32, no. 6, pp. 137-143, 2018.
[6] S. Y. Nikouei, Y. Chen, S. Song, R. Xu, B.-Y. Choi, and T. R. Faughnan, "Real-time human detection as an edge service enabled by a lightweight cnn," in 2018 IEEE International Conference on Edge Computing (EDGE), 2018, pp. 125-129: IEEE.
[7] 品玩. (2020). AMD 收購賽靈思後,半導體業終極大亂鬥來了?. Available: https://finance.technews.tw/2020/10/30/amd-xilinx-growth-strategy/
[8] 雷鋒網. (2018). 收購 Altera 近三年,Intel 終於把 FPGA 賣給了數據中心 OEM 廠商. Available: https://technews.tw/2018/04/19/oems-adopt-intel-fpgas/
[9] D. T. Nguyen, T. N. Nguyen, H. Kim, and H.-J. J. I. T. o. V. L. S. I. S. Lee, "A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection," vol. 27, no. 8, pp. 1861-1873, 2019.
[10] K. Guo, S. Zeng, J. Yu, Y. Wang, and H. J. a. p. a. Yang, "A survey of FPGA-based neural network accelerator," 2017.
[11] S. Lee, D. Kim, D. Nguyen, J. J. I. T. o. C.-A. D. o. I. C. Lee, and Systems, "Double MAC on a DSP: Boosting the performance of convolutional neural networks on FPGAs," vol. 38, no. 5, pp. 888-897, 2018.
[12] L. Gong, C. Wang, X. Li, H. Chen, and X. Zhou, "MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2601-2612, 2018.
[13] R. Zhao, H.-C. Ng, W. Luk, and X. Niu, "Towards efficient convolutional neural network for domain-specific applications on FPGA," in 2018 28th International Conference on Field Programmable Logic and Applications (FPL), 2018, pp. 147-1477: IEEE.
[14] J. Zhang, Y. Liao, X. Zhu, H. Wang, and J. J. I. S. P. L. Ding, "A deep learning approach in the discrete cosine transform domain to median filtering forensics," vol. 27, pp. 276-280, 2020.
[15] S. O. Ayat, M. Khalil-Hani, A. A.-H. Ab Rahman, and H. J. N. Abdellatef, "Spectral-based convolutional neural network without multiple spatial-frequency domain switchings," vol. 364, pp. 152-167, 2019.
[16] K. Xu, M. Qin, F. Sun, Y. Wang, Y.-K. Chen, and F. Ren, "Learning in the frequency domain," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1740-1749.
[17] Z. Zhang and D. Li, "Hybrid Cross Deep Network for Domain Adaptation and Energy Saving in Visual Internet of Things," IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6026-6033, 2018.
[18] B. Xu, N. Wang, T. Chen, and M. J. a. p. a. Li, "Empirical evaluation of rectified activations in convolutional network," 2015.
[19] C. Nwankpa, W. Ijomah, A. Gachagan, and S. J. a. p. a. Marshall, "Activation functions: Comparison of trends in practice and research for deep learning," 2018.
[20] H. Gholamalinezhad and H. J. a. p. a. Khosravi, "Pooling Methods in Deep Neural Networks, a Review," 2020.
[21] N. Zidani, N. Kouadria, N. Doghmane, and S. Harize, "Low complexity pruned DCT approximation for image compression in wireless multimedia sensor networks," in 2019 5th International Conference on Frontiers of Signal Processing (ICFSP), 2019, pp. 26-30: IEEE.
[22] R. A. Horn, "The hadamard product," in Proc. Symp. Appl. Math, 1990, vol. 40, pp. 87-169.
[23] W. Jiang, Z. Wang, J. S. Jin, Y. Han, and M. J. N. Sun, "DCT–CNN-based classification method for the Gongbi and Xieyi techniques of Chinese ink-wash paintings," vol. 330, pp. 280-286, 2019.
[24] D. Wei, Q. Ran, Y. Li, J. Ma, and L. J. I. S. P. L. Tan, "A convolution and product theorem for the linear canonical transform," vol. 16, no. 10, pp. 853-856, 2009.
[25] S. H. Khan, M. Hayat, and F. J. N. N. Porikli, "Regularization of deep neural networks with spectral dropout," vol. 110, pp. 82-90, 2019.
[26] J. Lin, L. Ma, and J. J. I. A. Cui, "A Frequency-Domain Convolutional Neural Network Architecture Based on the Frequency-Domain Randomized Offset Rectified Linear Unit and Frequency-Domain Chunk Max Pooling Method," vol. 8, pp. 98126-98155, 2020.
[27] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
[28] J. Yang, Y.-Q. Zhao, J. C.-W. J. I. T. o. G. Chan, and R. Sensing, "Learning and transferring deep joint spectral–spatial features for hyperspectral classification," vol. 55, no. 8, pp. 4729-4742, 2017.
[29] T. Abtahi, C. Shea, A. Kulkarni, and T. J. I. T. o. V. L. S. I. S. Mohsenin, "Accelerating convolutional neural network with fft on embedded hardware," vol. 26, no. 9, pp. 1737-1749, 2018.
[30] C. Shorten and T. M. J. J. o. B. D. Khoshgoftaar, "A survey on image data augmentation for deep learning," vol. 6, no. 1, pp. 1-48, 2019.
[31] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[32] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," arXiv preprint arXiv:1709.01507, vol. 7, 2017.
[33] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International Conference on Machine Learning, 2019, pp. 6105-6114: PMLR.
[34] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform," IEEE transactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.
[35] G. K. Wallace, "The JPEG still picture compression standard," IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992.
[36] A. Cariow, Ł. J. R. Lesiecki, and C. Systems, "Small-size algorithms for type-IV discrete cosine transform with reduced multiplicative complexity," vol. 63, no. 9, pp. 465-487, 2020.
[37] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," 2017.
[38] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, "Squeeze-and-Excitation Networks," ArXiv e-prints, Accessed on: September 01, 2017Available: https://ui.adsabs.harvard.edu/#abs/2017arXiv170901507H
[39] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
[40] NoMachine Remote Desktop Software. Available: https://www.nomachine.com/
[41] C. C. Yann LeCun, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. Available: http://yann.lecun.com/exdb/mnist/
[42] N. Mu and J. J. a. p. a. Gilmer, "Mnist-c: A robustness benchmark for computer vision," 2019.
[43] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," 2009.
[44] D. Hendrycks and T. J. a. p. a. Dietterich, "Benchmarking neural network robustness to common corruptions and perturbations," 2019.
[45] F. S. Samaria and A. C. Harter, "Parameterisation of a stochastic model for human face identification," in Proceedings of 1994 IEEE workshop on applications of computer vision, 1994, pp. 138-142: IEEE.
[46] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[47] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-training with noisy student improves imagenet classification," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10687-10698.
[48] S. An, M. Lee, S. Park, H. Yang, and J. J. a. p. a. So, "An Ensemble of Simple Convolutional Neural Network Models for MNIST Digit Recognition," 2020.
[49] X. Liu, J. Pool, S. Han, and W. J. J. a. p. a. Dally, "Efficient sparse-winograd convolutional neural networks," 2018.
[50] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[51] A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.

無法下載圖示 全文公開日期 2024/08/04 (校內網路)
全文公開日期 2024/08/04 (校外網路)
全文公開日期 2024/08/04 (國家圖書館:臺灣博碩士論文系統)
QR CODE