簡易檢索 / 詳目顯示

研究生: 林廷叡
Ting-Jui Lin
論文名稱: Performance Enhancement for Data Deduplication System
Performance Enhancement for Data Deduplication System
指導教授: 謝仁偉
Jen-Wei Hsieh
口試委員: 黃元欣
Yuan-Shin Hwang
鄧惟中
Wei-Chung Teng
姚智原
Chih-yuan Yao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 45
中文關鍵詞: 重複資料刪除系統指紋固態硬碟
外文關鍵詞: data deduplication system, fingerprint, SSD
相關次數: 點閱:188下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來,隨著網路技術的發展成熟,大大增強了資料的流通性,使用者透過網路對遠端資料儲存系統存取的行為也日益頻繁。為了減少儲存系統內部龐大資料量所衍生的資料處理效能逐漸低落問題,重複資料刪除系統便相應而產生。重複資料刪除系統是一套對於資料內文極為敏感的系統,最基本的方式為透過雜湊函式對資料內文產生唯一的指紋碼,使得系統對資料產生識別的能力。其主要的目的是為了要排除系統上重複寫入的冗餘資料,一旦系統判斷請求寫入的資料已存在於系統中,則系統將不需要實際寫入資料,僅需將此寫入請求映射至系統內已存在的檔案即可。為了能夠有效地分辨出兩筆資料是否完全相同,指紋碼的搜尋系統顯得益發重要。本論文提出利用基於快閃記憶體所組成的固態硬碟來強化我們對指紋庫的搜尋與管理,有別於傳統使用動態隨機存取記憶體來存放指紋,固態硬碟能夠負載更多的指紋樣本,並且經由本論文所提出的演算法使其搜尋的速度盡可能的貼近動態隨機存取記憶體。此外,本論文對於在硬碟系統上運行重複資料刪除系統時所產生大量硬碟隨機讀存取行為做出有效的控管,傳統硬碟的隨機存取是使讀寫需求完成時間變長的最大元凶,不但延長了完成時間,更因為許多不必要的磁頭移動、碟片旋轉等機械行為而造成更多無謂的電力耗損。本論文的機制能夠縮減系統運行所花費的時間,並有效減少機械性的存取次數。在實驗結果上,我們得到了重複資料刪除系統上36.86%的寫入空間節省與34%系統時間的縮短,有效地縮減了儲存系統的成本,並讓使用者能夠更加快速的取得他們想要的資料。


In recent years, network technology has been developed dramatically. With the improvement of network bandwidth, accesses to the remote data storage system become more and more frequent. However, huge amount of data stored in data storage system might degrade the system performance. Data deduplication system could solve the performance degradation problem efficiently. Data deduplication is a technique that could eliminate the redundant copies of duplicated data. We can calculate fingerprint of each data segment through some hash function. Since fingerprint is unique in data deduplication system, we could identify whether two data segments are identical by comparing their fingerprints. Once we have identified the same fingerprints in the system, we could ignore the write request since its content has already existed in the system. In this thesis, we propose to enhance the performance of data deduplication system by storing the fingerprint store in SSD. In the traditional data deduplication system, DRAM is the major storage medium of fingerprint store. Comparing SSD with DRAM, the former one has a lower price. Using SSD as the major storage medium of fingerprint store, we could extend the capacity of fingerprint store easily. We also solve the data read fragmentation problem in data deduplication system to improve the system response time. The experiment results showed that the capacity of storage device could be saved by 36.86%, while the system response time could be improved by up to 34%.

1 緒論 1.1 資料中心簡介 1.2 傳統硬碟與固態硬碟 1.3 重複資料刪除系統 2 相關研究背景 2.1 重複資料刪除系統相關技術 2.1.1 即時系統與離線系統 2.1.2 指紋的搜尋 2.1.3 硬碟讀取碎裂 2.2 雜湊函數 2.3 B-TREE 2.4 LRU 3 研究動機 3.1 資源的利用 3.2 固態硬碟的導入 4 系統的設計 4.1 系統的核心目標 4.1.1 降低重複寫入的機率 4.1.2 減少讀取的反應時間 4.2 架構縱觀 4.2.1 硬體的配置 4.2.2 系統運作流程 4.3 指紋與指紋庫 4.3.1 雜湊函數與指紋 4.3.2 雙層的指紋庫 4.3.3 指紋的管理 4.4 參考值與資料的搬遷 4.4.1 參考值的紀錄 4.4.2 資料的搬遷 5 效能評估 5.1 系統資料的分析 5.2 資料塊的定址 5.3 硬體設備的設定 5.4 實驗結果的分析 5.4.1 儲存體空間 5.4.2 系統時間 6 結論

[1] Burkey, Roxanne E.; Breakfield, Charles V. (2000). Designing a total data solution: technology, implementation and deployment. Auerbach Best Practices. CRC Press. p.24
[2] Understanding Data Deduplication" Druva, 2009. Retrieved 2013-2-13
[3] Data Deduplication - Why, When, Where and How, Evaluator Group, retrieved 2011-07-05
[4] "In-line or post-process de-duplication? (updated 6-08)". Backup Central. Retrieved 2009-10-16.
[5] C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk,W. Kilian, P. Strzelczak, J. Szczepkowski, C. Ungureanu, and M. Welnicki. Hydrastor: A scalable secondary storage. In Proceedings of the Seventh USENIX Conference on File and Storage
[6] F. Guo and P. Efstathopoulos. Building a highperformance deduplication system. In Proceedings of the 2011 USENIX Annual Technical Conference, pages 25–25, June 2011.
[7] M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezis, and P. Camble. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST), pages 111–123, Feb. 2009.
[8] F. Chen, T. Luo, and X. Zhang, “CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives,”in FAST, 2011, pp. 77–90.
[9] DEBNATH, B., SENGUPTA, S., AND LI, J. ChunkStash: Speeding up inline storage deduplication using flash memory. In Proceedings of USENIX’10 (Boston, MA, June 2010).
[10] D. Meister and A. Brinkmann, "dedupv1: improving deduplication throughput using solid state drives (SSD)," IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), May 2010.
[11] B. Zhu, K. Li, and H. Patterson, “Avoiding the disk bottleneck in the Data Domain deduplication file system,” in Proceedings of 6th UNENIX Conference on File and Storage Technologies (FAST ’08), February 2008.
[12] C. Alvarez. NetApp deduplication for FAS and V-Series deployment and implementation guide. Technical Report TR-3505, NetApp, 2011.
[13] Srinivasan, K., Bisson, T., Goodson, G., and Voruanti, K. iDedup: Latency-aware, inline data deduplication for primary storage. USENIX, 2012,pp. 299–312.
[14] W. Dong, F. Douglis, K. Li, R. H. Patterson,S. Reddy, and P. Shilane. Tradeoffs in scalable data routing for deduplication clusters. USENIX, 2011,pp. 15–29.
[15] W. Xia, H. Jiang, D. Feng, and Y. Hua. Silo: asimilarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In Proceedings of the 2011 USENIX Annual Technical Conference, pages 26–28, June 2011.
[16] The Linux Kernel Archives
[17] Dell Drive Characteristics and Metrics, Dell Inc., 2013
[18] Samsung,"K9GAG08B0M Datasheet,"Samsung,Tech.Rep.,2006
[19] Intel,"Intel MD516 NAND Flash Memory JS29F16G08AAMC1, JS29F32G08CAMC1, JS29F64G08FAMC1," Intel, Tech. Rep., 2007.

QR CODE