簡易檢索 / 詳目顯示

研究生: 呂易穎
Yi-Ying Lu
論文名稱: K-Grouping:基於機器學習的分類器來降低固態硬碟中的寫入放大
K-Grouping: A Machine-Learning-based Data Classifier to Reduce the Write Amplification in SSDs
指導教授: 吳晋賢
Chin-Hsien Wu
口試委員: 謝仁偉
Jen-Wei Hsieh
陳雅淑
Ya-Shu Chen
林淵翔
Yuan-Hsiang Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 53
中文關鍵詞: 快閃記憶體冷熱資料資料分群垃圾收集寫入放大決策樹
外文關鍵詞: Flash memory, Hot-Cold data, Data clustering, Garbage collection, Write amplification, Decision tree
相關次數: 點閱:232下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

由快閃記憶體組成的固態硬碟有著非揮發性、存取速度快、抗震、低功耗、體積小等優勢,因此近年來被廣泛應用在各類設備的資料儲存裝置。由於快閃記憶體的硬體設計,它並不支持資料的原地更新,且資料寫入要以頁面為單位,而資料清除則必須以區塊為單位。由於以上兩個特性,使得我們在抹除區塊之前,必須先搬移區塊中剩餘的有效頁面到其它空白頁面之後,才能執行清理動作,因此減少清理時的有效資料搬移量是固態硬碟的重要課題。透過分類寫入資料能夠有效的集中固態硬碟中無效頁面的分布。選擇較多無效頁面的區塊來清除,能有效的減少固態硬碟進行垃圾清理時的資料搬移成本。本文提出了以機器學習演算法為基底的方法,針對不同的工作負載自適應設計專屬的資料分類器,來將相同特性的寫入請求寫在同群體的資料區塊當中。透過如此的寫入設計能有效的集中資料更新時所產生的無效頁面,從而提升區塊清除的效率,減少寫入放大,進一步提升整體固態硬碟的壽命與效能。


Solid-state drives (SSDs) composed of flash memory have the advantages of non-volatility, fast speed, shock resistance, low-power consumption, and small size. In recent years, the SSDs have been using as data storage for various devices widely. Two critical characteristics of flash memory are that it does not support the in-place update for the data, and it must write data in units of a page and erase data in units of a block. Due to the two characteristics, when a block is selected as a victim block to erase, we need to move the remaining valid pages from the victim block to another free block. Therefore, how to reduce the amount of valid page movement is a crucial issue for SSDs. By performing data classification, it can sufficiently concentrate the distribution of invalid pages in the flash memory and reduce the data movement cost. This thesis proposes a method to design an adaptive data classifier for different workloads based on the machine learning algorithm. The classifier writes the requests with the same characteristics in the same group of data blocks. Through such a design, it can improve the performance of SSDs by reducing the live page copying and further decreasing the write amplification.

1. Introduction 2. Background 2.1. Flash Translation Layer 2.1.1. Address Translation 2.1.2. Garbage-Collection 2.1.3. Wear-Leveling 2.2. Write Amplification 2.3. Hot-Cold Classification 2.3.1. Two-Level-LRU 2.3.2. WDAC 2.3.3. Multiple Bloom Filters 2.3.4. DAC 3. Motivation 4. K-Grouping 4.1. Framework 4.2. A ML-based Data Classifier 4.2.1. Feature Retrieving 4.2.2. Data Preprocessing 4.2.3. Data Clustering 4.2.4. Classifier Training 4.3. Online Classifying 4.3.1. Write Operation 4.3.2. GC Operation 5. Experiment 5.1. Experiment Setting 5.1.1. Workloads Setting 5.1.2. Simulator Setting 5.1.3. Methods for Comparison 5.1.4. Memory Overhead 5.2. Experiment Overview 5.3. Experiment Result 5.3.1. Features of Each Group 5.3.2. The WA of Different K 5.3.3. The WA of Comparison Method 5.3.4. The WA of Different Size of Training Data 5.3.5. Fixed Classifier for Future Several Days 5.4. Combine MLDC with DAC 6. Conclusion References

[1] A. Gupta, Y. Kim, and B. Urgaonkar, “Dftl: a flash translation layer employing demand-based selective caching of page-level address mappings,” vol. 44, no. 3. ACM, 2009.
[2] C.-H. Wu and S.-A. Chen, “Jom: A joint operation mechanism for nand flash memory,” ACM Transactions on Embedded Computing Systems (TECS), vol. 15, no. 4, p. 74, 2016.
[3] A. Kawaguchi, S. Nishioka, and H. Motoda, “A flash-memory based file system.” in USENIX, 1995, pp. 155–164.
[4] M.-L. Chiang and R.-C. Chang, “Cleaning policies in mobile computers using flash memory,” Journal of Systems and Software, vol. 48, no. 3, pp. 213–231, 1999.
[5] R. Lucchesi, “Ssd flash drives enter the enterprise,” Silverton Consulting. accessed on, vol. 8, p. 2008, 2011.
[6] L.-P. Chang and T.-W. Kuo, “An adaptive striping architecture for flash memory storage systems of embedded systems,” in Proceedings. Eighth IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, 2002, pp. 187–196.
[7] D. Park and D. H. Du, “Hot data identification for flash-based storage systems using multiple bloom filters,” in 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 2011, pp. 1–11.
[8] J.-W. Hsieh, T.-W. Kuo, and L.-P. Chang, “Efficient identification of hot data for flash memory storage systems,” ACM Transactions on Storage (TOS), vol. 2, no. 1, pp. 22–40, 2006.
[9] M.-L. Chiang, P. C. Lee, and R.-C. Chang, “Using data clustering to improve cleaning performance for flash memory,” Software: Practice and Experience, vol. 29, no. 3, pp. 267–290, 1999.
[10] I. Te, M. Lokhandwala, Y.-C. Hu, and H.-W. Tseng, “Pensieve: a machine learning assisted ssd layer for extending the lifetime,” in 2018 IEEE 36th International Conference on Computer Design (ICCD). IEEE, 2018, pp. 35–42.
[11] S. Lee, D. Shin, Y.-J. Kim, and J. Kim, “Last: locality-aware sector translation for nand flash memory-based storage systems,” ACM SIGOPS Operating Systems Review, vol. 42, no. 6, pp. 36–42, 2008.
[12] L.-P. Chang, “A hybrid approach to nand-flash-based solid-state disks,” IEEE Transactions on Computers, vol. 59, no. 10, pp. 1337–1349, 2010.
[13] P. Yang, N. Xue, Y. Zhang, Y. Zhou, L. Sun, W. Chen, Z. Chen, W. Xia, J. Li, and K. Kwon, “Reducing garbage collection overhead in SSD based on workload prediction,” in 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 19). USENIX Association, Jul. 2019.
[14] H. Wang, X. Yi, P. Huang, B. Cheng, and K. Zhou, “Efficient ssd caching by avoiding unnecessary writes using machine learning,” in Proceedings of the 47th International Conference on Parallel Processing. ACM, 2018, p. 82.
[15] D. Steinberg and P. Colla, “Cart: classification and regression trees,” The top ten algorithms in data mining, vol. 9, p. 179, 2009.
[16] Z. Xu, R. Li, and C.-Z. Xu, “Cast: A page-level ftl with compact address mapping and parallel data blocks,” in 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC). IEEE, 2012, pp. 142–151.
[17] I. Shin, “Hot/cold clustering for page mapping in nand flash memory,” IEEE Transactions on Consumer Electronics, vol. 57, no. 4, pp. 1728–1731, 2011.
[18] C. Lee, T. Kumano, T. Matsuki, H. Endo, N. Fukumoto, and M. Sugawara, “Understanding storage traffic characteristics on enterprise virtual desktop infrastructure,” in Proceedings of the 10th ACM International Systems and Storage Conference. ACM, 2017, p. 13.
[19] “Umasstracerepository: Storage,” http://traces.cs.umass.edu, univ. of Massachusetts.
[20] A. Verma, R. Koller, L. Useche, and R. Rangaswami, “Srcmap: Energy proportional storage using dynamic consolidation.” in FAST, vol. 10, 2010, pp. 267–280.
[21] R. Koller and R. Rangaswami, “I/o deduplication: Utilizing content similarity to improve i/o performance,” ACM Transactions on Storage (TOS), vol. 6, no. 3, p. 13, 2010.
[22] S. Kavalanekar, B. Worthington, Q. Zhang, and V. Sharda, “Characterization of storage workload traces from production windows servers,” in 2008 IEEE International Symposium on Workload Characterization. IEEE, 2008, pp. 119–128.
[23] D. Narayanan, A. Donnelly, and A. Rowstron, “Write off-loading: Practical power management for enterprise storage,” ACM Transactions on Storage (TOS), vol. 4, no. 3, p. 10, 2008.
[24] M. Kwon, J. Zhang, G. Park, W. Choi, D. Donofrio, J. Shalf, M. Kandemir, and M. Jung, “Tracetracker: Hardware/software co-evaluation for large-scale i/o workload reconstruction,” pp. 87–96, 2017.
[25] J. Lee and J.-S. Kim, “An empirical study of hot/cold data separation policies in solid state drives (ssds),” in Proceedings of the 6th InternationalSystems and Storage Conference. ACM, 2013, p. 12.

無法下載圖示 全文公開日期 2024/08/22 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE