應用於卷積神經網路加速器之高面積效率浮點數特徵圖壓縮器

簡易檢索 / 詳目顯示

回結果列表

研究生：	顏百葵 Bai-Kui Yan
論文名稱：	應用於卷積神經網路加速器之高面積效率浮點數特徵圖壓縮器 Area-Efficient Compressor/Decompressor for Floating-Point Feature Maps in Convolutional Neural Network Accelerators
指導教授：	阮聖彰 Shanq-Jang Ruan
口試委員:	沈中安 Chung-An Shen 李佩君 Pei-Jun Lee 林銘波 Ming-Bo Lin
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	73
中文關鍵詞：	資料壓縮、卷基神經網路、卷基神經網路加速器、浮點數、面積效率
外文關鍵詞：	Compression, Convolutional neural networks (CNNS), CNN accelerators, floating-point (FP), Area-efficient
相關次數：	點閱：349 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

卷積神經網絡(CNN) 已使用在許多人工智能應用中，例如物件偵測、圖像分類和自然語言處理。由於CNN 需要大量的計算資源，因此許多計算架構被提出用來提高計算的吞吐量和能效。然而，這些計算架構需要在晶片和晶片外的記憶體之間進行大量的數據移動，這會導致晶片外的記憶體的高能量消耗；因此有研究為了減少大量的數據移動而提出特徵圖壓縮。這也讓特徵圖壓縮的設計成為CNN加速器晶片外記憶體能效的主要研究之一。在這項工作中，我們提出了用於CNN加速器的浮點特徵圖壓縮器。除了壓縮零之外，我們還根據浮點數格式壓縮特徵圖中的非零值。在ILSVRC 2012 數據集上與最新技術相比此壓縮演算法實現了較低的面積和相似的壓縮率。

Convolutional neural networks (CNNs) have been deployed on many artificial intelligence applications such as object detection, image classification, and natural language processing. Since CNNsneed massive computing resources, lots of computing architectures are proposed to improve the throughput and energy efficiency of the computing. However, those computing architectures need high data movement between the chip and off-chip memories, which causes high energy consumption on the off-chip memory; thus, the feature map (fmap) compression has been discussed for reducing the data movement. Therefore, the design of fmap compression becomes one of the main researches on CNN accelerator for energy efficiency on the off-chip memory. In this work, we proposed floating-point (FP) fmap compression for hardware accelerator. In addition to the zero compression, we also compress nonzero values in the fmap based on the FP format. The compression algorithm achieves low area overhead and a similar compression ratio compared with the state-of-the-art on ILSVRC 2012 dataset.

RECOMMENDATION FORM
COMMITTEE FORM
摘要
ABSTRACT
ACKNOWLEDGMENTS
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
CHAPTER 1
Introduction
1.1 Background of CNN Accelerator
1.2 Challenges of Previous Works
1.3 Contribution of This Thesis
1.4 Organization
CHAPTER 2
Background
2.1 The CNN Algorithm
2.2 Compression Algorithms
2.2.1 The Zero-RLE
2.2.2 The Delta Encoding
CHAPTER 3
Related Work
3.1 Quantization
3.2 Fmap Compression
CHAPTER 4
Proposed Method
4.1 Compression Algorithm
4.2 Hardware Architecture
4.2.1 Compressor
4.2.2 Decompressor
CHAPTER 5
Experimental Results
5.1 Environment/Dataset Setup
5.2 Selection of Parameters
5.2.1 The Consecutive Zeros Storage Length of Zero-RLE
5.2.2 The Difference Storage Length of Delta Encoder
5.3 Comparison with Previous Works
CHAPTER 6
Conclusions

REFERENCES
APPENDIX 1
APPENDIX 2
APPENDIX 3
APPENDIX 4
APPENDIX 5
APPENDIX 6
                                

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.

S. S. Yadav and S. M. Jadhav, “Deep convolutional neural network based medical image classification for disease diagnosis,” Journal of Big Data, vol. 6, no. 1, pp. 1–18, 2019.

P. Li, J. Li, and G. Wang, “Application of convolutional neural network in natural language processing,” in 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing, pp. 120–122, IEEE, 2018.

M. Al-Qizwini, I. Barjasteh, H. Al-Qassab, and H. Radha, “Deep learning algorithm for autonomous driving using googlenet,” IEEE Intelligent Vehicles Symposium, pp. 89–96, IEEE, 2017.

X. Feng, N. J. Tustison, S. H. Patel, and C. H. Meyer, “Brain tumor segmentation using an ensemble of 3d u-nets and overall survival prediction using radiomic features,” Frontiers in computational neuroscience, vol. 14, p. 25, Apr. 2020.

B. Zhao, X. Li, X. Lu, and Z. Wang, “A cnn–rnn architecture for multilabel weather recognition,” Neurocomputing, vol. 322, pp. 47–57, Dec. 2018.

M. Peemen, A. A. A. Setio, B. Mesman and H. Corporaal, “Memory-Centric Accelerator Design for Convolutional Neural Networks”, IEEE 31st International Conference on Computer Design, Asheville, USA, Oct.2013, pp. 13-19

X. Wei et al., “Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs,” ACM/EDAC/IEEE Design Automation Conference, Austin, TX, 2017, pp.1-6.

N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” ACM/IEEE Annual International Symposium on Computer Architecture, Toronto, ON, 2017, pp. 1-12.

V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, pp. 2295–2329, Dec. 2017.

J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.

B. Jacob et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Slat Lak City, UT, 2018, pp. 2704-2713.

M. Courbariaux, Y. Bengio, and J.-P. David, “BinaryConnect: Training deep neural networks with binary weights during propagations,” in Proc. Neural Information Processing Systems, 2015, pp. 3123-3131.

J. Jo, S. Cha, D. Rho, and I.-C. Park, “Dsip: A scalable inference accelerator for convolutional neural networks,” IEEE Journal of Solid- State Circuits, vol. 53, pp. 605–618, Feb. 2017.

Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energye-fficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, vol. 52, pp. 127–138, Jan. 2016.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. International Conference on Learning Representations, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.

V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. International Conference on Machine Learning, 2010, pp. 807-814.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. International Conference on Machine Learning, 2013, pp. 1-6.

D.-A Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUS,)” in Proc. International Conference on Learning Representations, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proc. International Conference on Computer Vision, 2015, pp. 1026-1034.

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,“ in Proc. International Conference on Machine Learning, 2015, pp.448-456.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.

C. Szegedy, et al., “Going deeper with convolutions,” in Proc. Conference on Computer Vision and Pattern Recognition, 2015, pp.1-9.

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” CoRR, abs/1704.04861, 2017.

L. Sifre. “Rigid-motion scattering for image classification.” PhD thesis, Ph. D. thesis, 2014

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet classification using binary convolutional neural networks,” in Proc. European Conference on Computer Vision, 2016, pp.525-542.

F. Li and B. Liu, “Ternary weight networks,” in Proc. NIPS Workshop Efficient Methods Deep Neural Netw., 2016.

C. Zhu, S. Han, H. Mao, and W.J. Dally, “Trained ternary quantization,” in Proc. International Conference on Learning Representations, 2017.

R. Andri, L. Cavigelli, D. Rossi, and L. Benini, “YodaNN: An ultra-low power convolutional neural network accelerator based on binary weight,” in Proc. IEEE Computer Society Annual Symposium on VLSI, 2016, pp. 236-241.

A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network quantization: Towards lossless CNNs with low-precision weights,” in Proc. International Conference on Learning Representations, 2017

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “SCNN: An accelerator for compressed-sparse convolutional neural networks,” ACM SIGARCH Computer Architecture News, vol. 45, pp. 27–40, Jun. 2017.

A. Aimar, H. Mostafa, E. Calabrese, A. Rios-Navarro, R. Tapiador- Morales, I.-A. Lungu, M. B. Milde, F. Corradi, A. Linares-Barranco, S.-C. Liu, et al., “Nullhop: A flexible convolutional neural network accelerator based on sparse representations of feature maps,” IEEE transactions on neural networks and learning systems, vol. 30, pp. 644– 656, Mar. 2018.

J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” ACM SIGARCH Computer Architecture News, vol. 44, pp. 1–13, Jun. 2016.

L. Cavigelli, G. Rutishauser, and L. Benini, “EBPC: Extended bitplane compression for deep neural network inference and training accelerators,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, pp. 723–734, Dec. 2019.

全文公開日期 2027/08/01 (校內網路)
全文公開日期 2027/08/01 (校外網路)
全文公開日期 2027/08/01 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文