簡易檢索 / 詳目顯示

研究生: 張祐瑀
Yu-Yu Chang
論文名稱: 適用於深度神經網路應用的熵曉知權重壓縮技術
Entropy-Aware Weight Compression Techniques for DNN Applications
指導教授: 呂學坤
Shyue-Kung Lu
口試委員: 呂學坤
Shyue-Kung Lu
王乃堅
Nai-Jian Wang
李進福
Jin-Fu Li
黃樹林
Shu-Lin Hwang
洪進華
Jin-Hua Hong
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 80
中文關鍵詞: 權重壓縮深度神經網路影像壓縮權重共享K-平均演算法
外文關鍵詞: Weight Compression, Deep Neural Network, JPEG, Weight Sharing, K-means
相關次數: 點閱:205下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來,深度神經網路 (Deep Neural Network, DNN) 被廣泛的應用在許多領域之中,例如自動駕駛、人臉辨識以及智慧家電等,深度神經網路模型透過大量的訓練資料,使正確率 (Accuracy) 達到一定標準,而訓練完成後會產生大量的權重資料 (Weight),這些權重資料被儲存在非揮發性記憶體中以供後續使用。在壓縮深度神經網路的權重資料過程中,需要考量記憶體中頻帶寬度、功率消耗、儲存空間的限制,近年來學者提出了權重壓縮技術來解決此問題,在無顯著的正確率損失時,透過權重壓縮技術可達到高壓縮率、低硬體成本之目標。
本篇論文提出熵曉知神經網路權重壓縮技術對權重進行壓縮,以JEPG 靜態影像壓縮技術為基礎,結合了權重量化(Weight Quantization)、K-平均演算法 (K-means Clustering) 等方法,並透過權重矩陣重映射 (Weight Matrix Remapping, WMR) 將相似的權重值集中在相同的8 × 8子權重矩陣 (Sub Weight Matrix, SWM) 中,進而達到使整體的8 × 8子權重矩陣之熵數值降低以提高壓縮率的目的,且提出一套完整的深度神經網路壓縮流程,由於全連接層 (Fully-connected Layer) 的權重冗餘遠高於卷積層 (Convolutional Layer),因此本研究著重於全連接層之權重進行壓縮。
以深度學習框架開發模擬器,模擬不同深度神經網路模型之全連接層,並應用熵曉知神經網路權重壓縮之技術,實驗結果顯示當正確率損失在 1 % 的條件下,在 MLP2 使用 MNIST 手寫辨識的資料集,實現了19.05倍的壓縮;當 LeNet5 使用MNIST 手寫辨識的資料集,實現了7.86倍的壓縮;而VGG16使用Cifar10資料集,實現了21.29倍的壓縮。綜合以上,可得知所提出之技術在實現高壓縮率時並能夠維持所需的正確率。


Deep Neural Network (DNN) has been widely used in autonomous driving, face recognition, and smart appliances in recent years. After a DNN model trained by training data, it will generate a large amount of weights for this model and training data. Those weights generated in the training process have to be stored in non-volatile memories for further usage. For the limited memory bandwidth, power budget, and storage space, it is inevitable to seek for efficient and effective techniques for compressing this big amount of weight data. There are weight compression techniques proposed to deal with these issues in recent years. The goals of weight compression include high compression rates, lower hardware overhead, and negligible inference accuracy degradation.
This thesis proposes entropy-aware weight compression techniques based on JPEG (Joint Photographic Experts Group, JPEG) still image compression standard to compress weight data. We also integrate the proposed techniques with weight quantization and K-means clustering algorithms for better compression. The image is viewed as a weight matrix, which is divided into 8 × 8 sub-weight matrices (SWMs). The novel weight matrix remapping technique is presented to adjust the constituent weights of each 8 × 8 SWM such that similar weights are contained in SWMs to reduce their entropy values. We focus on compressing the fully connected layers since they have higher weight redundancies than the convolutional layers. Moreover, the deep learning framework Pytorch is used for evaluating the inference accuracy and compression rates of different DNN models.
With 1% accuracy loss, the proposed techniques can achieve 19.05× compression for multilayer perceptron (MLP) with the MNIST dataset, 7.86× for LeNet-5 with the MNIST dataset, and 21.29× for VGG16 with Cifar10 dataset. That is, the proposed techniques can achieve good compression results while maintaining required accuracy levels.

致謝 I 摘要 II Abstract III 目錄 III 圖目錄 VIII 表目錄 XI 第一章 簡介 1 1.1 背景及動機 1 1.2 組織架構 5 第二章 深度學習模型資料壓縮技術 6 2.1 深度學習基本原理 6 2.1.1 神經元與深度學習神經網路架構 7 2.1.2 全連結神經網路 11 2.1.3 卷積神經網路 11 2.2 深度學習模型壓縮之概念 17 2.3 深度神經網路壓縮技術分類 17 2.3.1 網路剪枝 18 2.3.2 權重量化 19 2.3.3 權重共享 20 2.3.4 知識蒸餾 21 2.3.5 DCT權重壓縮技術 22 第三章 熵曉知神經網路權重壓縮技術 25 3.1 JEPG 靜態影像壓縮技術 25 3.1.1 JEPG 靜態影像壓縮技術之系統架構 25 3.1.2 JEPG 靜態影像壓縮流程 26 3.2 熵曉知神經網路權重壓縮技術設計流程 30 3.2.1 神經網路權重量化 31 3.2.2 基於 K-均值演算法之權重壓縮技術 34 3.2.3 熵曉知神經網路權重矩陣重映射技術 35 3.2.4 以 LFSR 為基礎之熵曉知權重矩陣重映射控制字產生技術 45 3.3 控制字產生技術 45 3.3.1 全字長控制字產生技術 47 3.3.2 分割字長控制字產生技術 47 3.3.3 漸進式字長控制字產生技術 48 第四章 實驗結果 50 4.1 深度學習框架與深度學習模型設定 50 4.2 神經網路權重矩陣重映射技術之熵分析 51 4.3 實驗結果分析 54 4.3.1 使用MNIST資料集之MLP2模型 54 4.3.2 使用MNIST資料集之LeNet5模型 58 4.3.3 使用Cifar10資料集之VGG16模型 59 第五章 結論與未來展望 60 5.1 結論 60 5.2 未來展望 60 參考文獻 61

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Proc. Communications of the ACM, vol. 60, no. 6, pp. 84-90, June 2017.
[2] M. Z. Alom et al., "A State-of-the-Art Survey on Deep Learning Theory and Architectures," Electronics, vol. 8, no. 3, pp. 292, Jan. 2019.
[3] C. Szegedy et al., "Going Deeper with Convolutions," in Proc. IEEE Int'l Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, June 2015.
[4] S. Liu and W. Deng, "Very Deep Convolutional Neural Network Based Image Classification Using Small Training Sample Size," in Proc. 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 730-734, June 2015.
[5] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Int'l Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, Dec. 2015.
[6] A. Shrestha and A. Mahmood, "Review of Deep Learning Algorithms and Architectures," IEEE Access, vol. 7, pp. 53040-53065, Apr. 2019.
[7] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," Proc. IEEE, vol. 105, no. 12, pp. 2295-2329, Mar. 2017.
[8] T. Malach, S. Greenberg, and M. Haiut, "Hardware-Based Real-Time Deep Neural Network Lossless Weights Compression," IEEE Access, vol. 8, pp. 205051-205060, Nov. 2020.
[9] R. Tariq, S. G. Khawaja, M. U. Akram, and F. Hussain, "Reconfigurable Architecture for Real-time Decoding of Canonical Huffman Codes," in Proc. 2nd Int'l Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1-6, May 2022.
[10] S. Han et al., "EIE: Efficient Inference Engine on Compressed Deep Neural Network," in Proc. ACM/IEEE 43rd Annual Int'l Symp. on Computer Architecture (ISCA), vol. 44, no. 3, pp. 243–254, May 2016.
[11] S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning both Weights and Connections for Efficient Neural Networks," Neural Information Processing Systems (NIPS), pp. 1135–1143, Oct. 2015.
[12] P. Molchanov, A. Mallya, S. Tyree, I. Frosio, and J. Kautz, "Importance Estimation for Neural Network Pruning," in Proc. IEEE/CVF Int'l Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-11, June 2019.
[13] E. Park, J. Ahn, and S. Yoo, "Weighted-Entropy-Based Quantization for Deep Neural Networks," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5456-5464, July 2017.
[14] F. Tung and G. Mori, "Deep Neural Network Compression by In-Parallel Pruning-Quantization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 3, pp. 568-579, Mar. 2020.
[15] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks," in Proc. European Conference on Computer Vision (ECCV), pp. 1-17, Aug. 2016.
[16] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat, "Q8BERT: Quantized 8-Bit BERT," Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS), pp. 36-39, Oct. 2019.
[17] Y. Wei, X. Pan, H. Qin, W. Ouyang, and J. Yan, "Quantization Mimic: Towards Very Tiny CNN for Object DetectionUnknown article," in Proc. European Conference on Computer Vision (ECCV), pp. 267-283, Sep. 2018.
[18] S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," in Proc. Int'l Conference on Learning Representations (ICLR), pp. 1-14, Feb. 2016.
[19] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen, "Compressing Neural Networks with the Hashing Trick," in Proc. Int'l Conference on Machine Learning (ICML), pp. 1-10, Apr. 2015.
[20] J. Wu, Y. Wang, Z. Wu, Z. Wang, A. Veeraraghavan, and Y. Lin, "Deepk-Means: Re-Training and Parameter Sharing with Harder ClusterAssignments for Compressing Deep Convolutions," in Proc. 35th Int'l Conference on Machine Learning (ICML), pp. 1-10, June 2018.
[21] E. Dupuis, D. Novo, I. O'Connor, and A. Bosio, "Sensitivity Analysis and Compression Opportunities in DNNs Using Weight Sharing," in Proc. IEEE 23rd Int'l Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), pp. 1-6, May 2020.
[22] G. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network," in Proc. Conference and Workshop on Neural Information Processing Systems, pp. 1-9, Mar. 2015.
[23] J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge Distillation: A Survey," Int'l Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, Mar. 2021.
[24] G. K. Wallace, "The JPEG Still Picture Compression Standard," IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. 1-17, Dec. 1991.
[25] B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2704-2713, June 2018.
[26] S. Na, L. Xumin, and G. Yong, "Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm," in Proc. Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 63-67, Apr. 2010.
[27] Y. Gong, L. Liu, M. Yang, and L. Bourdev, "Compressing Deep Convolutional Networks using Vector Quantization," in Proc. Int'l Conference on Learning Representations (ICLR), pp. 1-10, Dec. 2014.
[28] A. Paszke et al., "Automatic Differentiation in Pytorch," in Proc. 31st Conference and Workshop on Neural Information Processing Systems (NIPS), pp. 1–4, Dec. 2017.
[29] G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Apr. 2012.
[30] J. Bian et al., "Machine Learning in Real-Time Internet of Things (IoT) Systems: A Survey," IEEE Internet of Things Journal, vol. 9, no. 11, pp. 8364-8386, Mar. 2022.
[31] M. S. Riazi, B. Darvish Rouani, and F. Koushanfar, "Deep Learning on Private Data," IEEE Security & Privacy, vol. 17, no. 6, pp. 54-63, Dec. 2019.
[32] M. Mishra and M. Srivastava, "A view of Artificial Neural Network," in Proc. Int'l Conference on Advances in Engineering & Technology Research (ICAETR), pp. 1-3, Aug. 2014.
[33] X. Glorot, A. Bordes, and Y. Bengio, "Deep SparseRectifier Neural Networks for Speech Denoising," Proc. of Machine Learning Research (PMLR), vol. 15, pp. 315-323, Aug. 2011.
[34] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, "Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights," in Proc. Int'l Conference on Learning Representations (ICLR) pp. 1-13, Feb. 2017.
[35] J. H. Ko, D. Kim, T. Na, and S. Mukhopadhyay, "Design and Analysis of a Neural Network Inference Engine Based on Adaptive Weight Compression," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 1, pp. 109-121, Feb. 2019.
[36] M.-S. Song, "Entropy Encoding in Wavelet Image Compression," Applied and Numerical Harmonic Analysis book series (ANHA), pp. 293-311, Jan. 2008.
[37] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete Cosine Transform," IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, Jan. 1974.
[38] S. Sivanandam, A. Pasumpon, and P. Rani, "Lossy Still Image Compression Standards JPEG and JPEG2000 - A Survey," in Proc. Int'l Journal of the Computer, the Internet and Management (IJCIM) vol. 17, pp. 69-84, , May 2009.
[39] S. K. Lu, H. C. Jheng, M. Hashizume, J. L. Huang, and P. Ning, "Fault Scrambling Techniques for Yield Enhancement of Embedded Memories," in Proc. 22nd Asian Test Symposium, pp. 215-220, Nov. 2013.
[40] D. Li, "The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, Oct. 2012.
[41] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.

無法下載圖示 全文公開日期 2025/08/15 (校內網路)
全文公開日期 2025/08/15 (校外網路)
全文公開日期 2025/08/15 (國家圖書館:臺灣博碩士論文系統)
QR CODE