簡易檢索 / 詳目顯示

研究生: 鄭春章
Chun-Zhang Zheng
論文名稱: 一個基於混合式Computational Storage架構應用於加速CNN神經網路學習
A Hybrid Computational Storage Architecture to Accelerate CNN Training
指導教授: 吳晋賢
Chin-Hsien Wu
口試委員: 吳晋賢
Chin-Hsien Wu
陳維美
Wei-Mei Chen
林淵翔
Yuan-Hsiang Lin
林昌鴻
Chang-Hong Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 67
中文關鍵詞: 計算儲存體快閃記憶體捲積神經網路
外文關鍵詞: Computional Storage, NAND Flash Memory, Convolution Neural Networks
相關次數: 點閱:241下載:17
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在儲存裝置的領域中,運算儲存裝置(Computational Storage Drive, CSD)是近年來新興的研究領域主題,CSD具有省電及高速運算且高度平行的計算能力等優勢,其技術使越來越多廠商與學者加入研究。然而,現有的應用皆是儲存與運算分離在不同裝置上執行,若想將運算的工作搬移到儲存裝置的層級,我們勢必需要高速運算與高度平行的儲存裝置方案,而CSD正是目前最具潛力的選項之一,被視為未來商用儲存裝置的新型態儲存裝置,比起傳統SSD有更強大的運算效能,及更多樣化的應用可在上面進行發展。
    由於CSD具有高速運算與高度平行的特性,在使用NAND Flash時,可以充分使用SSD上高度平行化存取策略,如Multi-channel Parallel [1]、Multiple Dies Parallel…等,利用各種內部平行化機制可以使得CSD有著比一般應用情境下更佳的資料存取速度。本論文針對CSD應用於CNN神經網路時所會遇到的問題加以探討與研究,並提出適用於CSD架構的方法來加速CNN神經網路,以及同時混合使用不同性能的CSD的效能分析與探討。


    In storage systems, Computational Storage Drive, CSD has become popular these years. The properties of Computational Storage Drive contain low-power, high-speed and high-parallel computing. However, each application is executed on different devices. If we want to move the operation of computing into the storage level, we must need a high-speed and high-parallel computing solution. CSD is one of the most potential options. It is a new type of storage device for future commercial storage devices. It has more powerful computing performance than traditional SSDs, and more diverse applications can be developed on it.
    In addition, because CSD has the characteristics of high-speed and highly parallel computing, when using NAND Flash, you can fully use the highly parallel access strategy on SSD. It can use such as Multi-channel Parallel [1], Multiple Dies Parallel, etc. Using various internal parallelization mechanisms can make CSD have better data access speed than in general application scenarios. This paper will discuss the problems encountered when CSD is applied to CNN neural networks. Finally, a method suitable for the CSD architecture is proposed to accelerate the CNN neural network, and the performance analysis and discussion of hybrid CSDs with different performances at the same time. 

    第一章 緒論 1.1 前言 1.2 論文架構 第二章 環境背景和研究動機 2.1 背景運作原理與CSD相關背景 2.2 CNN背景敘述 2.3 研究動機 第三章 研究方法 3.1 系統架構(System Overview) 3.2 Asynchronous Convolution Operation 3.3 Adaptive Task Assignment 3.4 Multiple Model Parallelism 第四章 實驗與效能分析 4.1 實驗環境 4.2 比較議題 4.3 實驗效能結果分析 4.3.1探討加入平行CNN模型訓練會不會加快 4.3.2 探討在多種CNN環境之下可不可優化 4.4 各種CNN可優化程度分析 4.4.1 同樣實驗硬體下,不同CNN架構上的時間比例分析 4.4.2 同樣CNN架構下,不同實驗硬體上的時間比例分析 4.5 訓練的迭代次數對於CNN訓練的影響 第五章 結論 參考文獻

    [1] O. Yang, N. Xiao, M. Lai, "A scalable multi-channel parallel NAND flash memory controller architecture." 2011 Sixth Annual Chinagrid Conference. IEEE, 2011.

    [2] T. Li, Z. Lei, "A novel multiple dies parallel nand flash memory controller for high-speed data storage." 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI). IEEE, 2017.

    [3] C. Zambelli, R. Bertaggia, L. Zuolo, R. Micheloni, "Enabling Computational Storage Through FPGA Neural Network Accelerator for Enterprise SSD." IEEE Transactions on Circuits and Systems II: Express Briefs 66.10 (2019): 1738-1742.

    [4] M. Torabzadehkashi, S. Rezaei, V. Alves, N. Bagherzadeh, "Compstor: an in-storage computation platform for scalable distributed processing." 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2018.

    [5] W. H. Wu, " A Retention-Error Mitigation Method based on TLC NAND Flash Memory," National Taiwan University of Science and Technology, 2019.

    [6] ADVANTECH, company, Tech. Rep., 2016.

    [7] M. Torabzadehkashi, A. Heydarigorji, S. Rezaei, H. Bobarshad, V. Alves, N. Bagherzadeh, "Accelerating HPC applications using computational storage devices." 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 2019.

    [8] C. Yakopcic, M. Z. Alom, T. M. Taha, "Extremely parallel memristor crossbar architecture for convolutional neural network implementation." 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017.

    [9] K. Fukushima, S. Miyake, T. Ito, "A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position." Biol. Cybern. 36 (1980): 193-202.
    [10] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

    [11] "Convolutional Neural Networks (LeNet) - Deep Learning 0.1" in DeepLearning 0.1, LISA Lab.

    [12] A. Krizhevsky, I. Sutskever, G. E. Hinton, "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

    [13] K. Simonyan, A. Zisserman, "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

    [14] K. Sun, S. Li, Y. Luo, R. Renteria, K. Choi, "Highly-efficient parallel convolution acceleration by using multiple GPUs." 2017 International SoC Design Conference (ISOCC). IEEE, 2017.

    [15] D. Li, Y. Yang, W. Li, Q. Yang, "CISC: Coordinating Intelligent SSD and CPU to Speedup Graph Processing." 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC). IEEE, 2018.

    [16] G. Begna, D. B. Rawat, M. Garuba, L. Njilla, "SecureCASH: Securing Context-Aware Distributed Storage and Query Processing in Hybrid Cloud Framework." 2018 IEEE Conference on Communications and Network Security (CNS). IEEE, 2018.

    [17] M. Torabzadehkashi, S. Rezaei, A. Heydarigorji, H. Bobarshad, V. Alves, N. Bagherzadeh, "Catalina: In-storage processing acceleration for scalable big data analytics." 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). IEEE, 2019.

    [18] X. Song, T. Xie, W. Pan, "RISP: a reconfigurable in-storage processing framework with energy-awareness." 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 2018.

    [19] J. C. Vega, Q. C. Shen, P. Chow, "SHIP: Storage for Hybrid Interconnected Processors." 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2020.

    [20] A. A. Devarajan, T. SudalaiMuthu, "Cloud Storage Monitoring System analyzing through File Access Pattern." 2019 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, 2019.

    [21] Z. He, J. Kuang, Y. Tan, W. Liu, B. Sheng, "Design and Implementation of GPU Accelerated Active Storage in FastDFS." 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 2019.

    [22] K. Sun, S. Li, Y. Luo, R. Renteria, K. Choi, "Highly-efficient parallel convolution acceleration by using multiple GPUs." 2017 International SoC Design Conference (ISOCC). IEEE, 2017.

    [23] N. Dryden, N. Maruyama, T. Benson, T. Moon, M. Snir, B. V. Essen, "Improving strong-scaling of CNN training by exploiting finer-grained parallelism." 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2019.

    [24] S. Guedria, N. D. Palma, F. Renard, N. Vuillerme, "Auto-CNNp: a component-based framework for automating CNN parallelism." 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019.

    QR CODE