研究生: |
吳柏逸 Bo-Yi Wu |
---|---|
論文名稱: |
輕量化頻域深度網路 Lightweight Frequency-Domain Deep Network |
指導教授: |
陸敬互
Ching-Hu Lu |
口試委員: |
陸敬互
Ching-Hu Lu 郭重顯 Chung-Hsien Kuo 鍾聖倫 Sheng-Luen Chung 蘇順豐 Shun-Feng Su 廖峻鋒 Chun-Feng Liao |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 104 |
中文關鍵詞: | 輕量化神經網路 、頻域 、邊緣計算 、物聯網 、計算域轉換 、影像辨識 |
外文關鍵詞: | Lightweight Deep Network, Frequency Domain, Edge Computing, IoT, Computation Domain Transform, Image Recognition |
相關次數: | 點閱:232 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
較費時的影像處理常常嚴重影響到電腦視覺服務的即時性。近幾年隨著人工智慧物聯網 (AIoT) 的快速發展,結合邊緣計算與AIoT的攝影機 (以下簡稱邊緣攝影機) 已在物件偵測、物件辨識等應用領域獲得成功。雖然近年來已有研究透過深度神經網路進行即時影像辨識,但由於既有研究模型計算過於複雜,無法在資源有限的邊緣攝影機上即時完成任務。所以目前已經有頻域結合時域的深度網路模型來改善上述問題,但目前尚未有純頻域的輕量深度網路。因此,本研究提出「輕量化純頻域深度網路模型」,能夠於邊緣攝影機完成即時影像辨識運算並節省能源。本研究首先優化既有研究「頻域卷積層」以改善提取特徵時產生的偏差,並分別優化「頻域激勵函式」來改善時域激勵函式對頻譜數值造成的準確度下降、「頻域池化層」來改善池化時造成的特徵遺失、「頻域Dropout」來改善不同影像產生數值浮動的問題、「計算域轉換方法」來改善因輸入影像尺寸增大造成模型計算次數度隨之增加的問題。更重要的是,本研究首度提出「頻域全連結層」,其更能表達頻譜資料的特徵分佈關係。實驗結果顯示,進行MNIST與CIFAR-10資料集推論時分別有最高99.74%與99.82%準確度,略勝既有頻域的結果。本研究與既有研究相比,在邊緣攝影機 (採NVIDIA Jetson TX2) 上能分別提升52.01%與52.00%每秒幀率。由以上可知,本研究於即時影像處理與辨識準確度皆擁有絕對優勢。在資料空間處理上,進行上述兩種資料集推論時分別節省最多43.64%與40.00%記憶體使用量。在能源消耗方面,進行上述兩種資料集推論時能分別節省最多26.09%與25.56%每秒電力消耗。綜合上述可說明本研究提出的輕量化純頻域深度網路與過去研究相比更能獲得更佳的辨識度與效能表現。
Smart cameras leveraging edge computing (hereinafter referred to as edge cameras) have been successful in object detection and object recognition. Although there have been research on real-time image recognition through deep neural networks in recent years, their models are too complex to execute in real time on edge cameras with limited resources. Therefore, there are already frequency domain combined with time domain (or hybrid domain) deep network models to improve the above problem, but there is no pure frequency domain lightweight deep network yet. Thus, our study proposes "Lightweight Pure Frequency-Domain Deep Network" that can perform real-time image recognition on edge cameras and save energy. Comparing to the existing time-domain or hybrid domain research, this study optimizes frequency-domain convolutional layer to avoid the bias generated when extracting features, frequency-domain activation function to improve the accuracy, frequency-domain pooling layer to reduce feature loss, frequency-domain dropout to improve the problem of numerical fluctuations in different images, and computational domain transform to reduce the number of calculation increases due to increasing the input image size. More importantly, our study is the first to propose frequency-domain fully connected layer, which can better represent a spectral feature distribution. The experimental results show that the highest accuracy of 99.74% and 99.82% can be achieved when tested using the MNIST and CIFAR-10 datasets, slightly outperform those in the time or hybrid domain. Compared with the existing research, this study can improve the frame per second by 52.01% and 52.00% respectively on the edge camera (on the NVIDIA Jetson TX2). This study saves up to 43.64% and 40.00% memory usage and 26.09% and 25.56% power consumption when tested using the above two datasets respectively.
[1] S. Lucero, "IoT platforms: enabling the Internet of Things.,"
[2] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. J. I. i. o. t. j. Xu, "Edge computing: Vision and challenges," vol. 3, no. 5, pp. 637-646, 2016.
[3] J. Chen and X. J. P. o. t. I. Ran, "Deep Learning With Edge Computing: A Review," vol. 107, no. 8, pp. 1655-1674, 2019.
[4] J. Lim, J. Seo, and Y. Baek, "CamThings: IoT camera with energy-efficient communication by edge computing based on Deep Learning," in 2018 28th International Telecommunication Networks and Applications Conference (ITNAC), 2018, pp. 1-6: IEEE.
[5] J. Ren, Y. Guo, D. Zhang, Q. Liu, and Y. J. I. N. Zhang, "Distributed and efficient object detection in edge computing: Challenges and solutions," vol. 32, no. 6, pp. 137-143, 2018.
[6] S. Y. Nikouei, Y. Chen, S. Song, R. Xu, B.-Y. Choi, and T. R. Faughnan, "Real-time human detection as an edge service enabled by a lightweight cnn," in 2018 IEEE International Conference on Edge Computing (EDGE), 2018, pp. 125-129: IEEE.
[7] 品玩. (2020). AMD 收購賽靈思後,半導體業終極大亂鬥來了?. Available: https://finance.technews.tw/2020/10/30/amd-xilinx-growth-strategy/
[8] 雷鋒網. (2018). 收購 Altera 近三年,Intel 終於把 FPGA 賣給了數據中心 OEM 廠商. Available: https://technews.tw/2018/04/19/oems-adopt-intel-fpgas/
[9] D. T. Nguyen, T. N. Nguyen, H. Kim, and H.-J. J. I. T. o. V. L. S. I. S. Lee, "A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection," vol. 27, no. 8, pp. 1861-1873, 2019.
[10] K. Guo, S. Zeng, J. Yu, Y. Wang, and H. J. a. p. a. Yang, "A survey of FPGA-based neural network accelerator," 2017.
[11] S. Lee, D. Kim, D. Nguyen, J. J. I. T. o. C.-A. D. o. I. C. Lee, and Systems, "Double MAC on a DSP: Boosting the performance of convolutional neural networks on FPGAs," vol. 38, no. 5, pp. 888-897, 2018.
[12] L. Gong, C. Wang, X. Li, H. Chen, and X. Zhou, "MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2601-2612, 2018.
[13] R. Zhao, H.-C. Ng, W. Luk, and X. Niu, "Towards efficient convolutional neural network for domain-specific applications on FPGA," in 2018 28th International Conference on Field Programmable Logic and Applications (FPL), 2018, pp. 147-1477: IEEE.
[14] J. Zhang, Y. Liao, X. Zhu, H. Wang, and J. J. I. S. P. L. Ding, "A deep learning approach in the discrete cosine transform domain to median filtering forensics," vol. 27, pp. 276-280, 2020.
[15] S. O. Ayat, M. Khalil-Hani, A. A.-H. Ab Rahman, and H. J. N. Abdellatef, "Spectral-based convolutional neural network without multiple spatial-frequency domain switchings," vol. 364, pp. 152-167, 2019.
[16] K. Xu, M. Qin, F. Sun, Y. Wang, Y.-K. Chen, and F. Ren, "Learning in the frequency domain," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1740-1749.
[17] Z. Zhang and D. Li, "Hybrid Cross Deep Network for Domain Adaptation and Energy Saving in Visual Internet of Things," IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6026-6033, 2018.
[18] B. Xu, N. Wang, T. Chen, and M. J. a. p. a. Li, "Empirical evaluation of rectified activations in convolutional network," 2015.
[19] C. Nwankpa, W. Ijomah, A. Gachagan, and S. J. a. p. a. Marshall, "Activation functions: Comparison of trends in practice and research for deep learning," 2018.
[20] H. Gholamalinezhad and H. J. a. p. a. Khosravi, "Pooling Methods in Deep Neural Networks, a Review," 2020.
[21] N. Zidani, N. Kouadria, N. Doghmane, and S. Harize, "Low complexity pruned DCT approximation for image compression in wireless multimedia sensor networks," in 2019 5th International Conference on Frontiers of Signal Processing (ICFSP), 2019, pp. 26-30: IEEE.
[22] R. A. Horn, "The hadamard product," in Proc. Symp. Appl. Math, 1990, vol. 40, pp. 87-169.
[23] W. Jiang, Z. Wang, J. S. Jin, Y. Han, and M. J. N. Sun, "DCT–CNN-based classification method for the Gongbi and Xieyi techniques of Chinese ink-wash paintings," vol. 330, pp. 280-286, 2019.
[24] D. Wei, Q. Ran, Y. Li, J. Ma, and L. J. I. S. P. L. Tan, "A convolution and product theorem for the linear canonical transform," vol. 16, no. 10, pp. 853-856, 2009.
[25] S. H. Khan, M. Hayat, and F. J. N. N. Porikli, "Regularization of deep neural networks with spectral dropout," vol. 110, pp. 82-90, 2019.
[26] J. Lin, L. Ma, and J. J. I. A. Cui, "A Frequency-Domain Convolutional Neural Network Architecture Based on the Frequency-Domain Randomized Offset Rectified Linear Unit and Frequency-Domain Chunk Max Pooling Method," vol. 8, pp. 98126-98155, 2020.
[27] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
[28] J. Yang, Y.-Q. Zhao, J. C.-W. J. I. T. o. G. Chan, and R. Sensing, "Learning and transferring deep joint spectral–spatial features for hyperspectral classification," vol. 55, no. 8, pp. 4729-4742, 2017.
[29] T. Abtahi, C. Shea, A. Kulkarni, and T. J. I. T. o. V. L. S. I. S. Mohsenin, "Accelerating convolutional neural network with fft on embedded hardware," vol. 26, no. 9, pp. 1737-1749, 2018.
[30] C. Shorten and T. M. J. J. o. B. D. Khoshgoftaar, "A survey on image data augmentation for deep learning," vol. 6, no. 1, pp. 1-48, 2019.
[31] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[32] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," arXiv preprint arXiv:1709.01507, vol. 7, 2017.
[33] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International Conference on Machine Learning, 2019, pp. 6105-6114: PMLR.
[34] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform," IEEE transactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.
[35] G. K. Wallace, "The JPEG still picture compression standard," IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992.
[36] A. Cariow, Ł. J. R. Lesiecki, and C. Systems, "Small-size algorithms for type-IV discrete cosine transform with reduced multiplicative complexity," vol. 63, no. 9, pp. 465-487, 2020.
[37] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," 2017.
[38] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, "Squeeze-and-Excitation Networks," ArXiv e-prints, Accessed on: September 01, 2017Available: https://ui.adsabs.harvard.edu/#abs/2017arXiv170901507H
[39] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
[40] NoMachine Remote Desktop Software. Available: https://www.nomachine.com/
[41] C. C. Yann LeCun, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. Available: http://yann.lecun.com/exdb/mnist/
[42] N. Mu and J. J. a. p. a. Gilmer, "Mnist-c: A robustness benchmark for computer vision," 2019.
[43] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," 2009.
[44] D. Hendrycks and T. J. a. p. a. Dietterich, "Benchmarking neural network robustness to common corruptions and perturbations," 2019.
[45] F. S. Samaria and A. C. Harter, "Parameterisation of a stochastic model for human face identification," in Proceedings of 1994 IEEE workshop on applications of computer vision, 1994, pp. 138-142: IEEE.
[46] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[47] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-training with noisy student improves imagenet classification," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10687-10698.
[48] S. An, M. Lee, S. Park, H. Yang, and J. J. a. p. a. So, "An Ensemble of Simple Convolutional Neural Network Models for MNIST Digit Recognition," 2020.
[49] X. Liu, J. Pool, S. Han, and W. J. J. a. p. a. Dally, "Efficient sparse-winograd convolutional neural networks," 2018.
[50] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[51] A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.