簡易檢索 / 詳目顯示

研究生: 吳華軒
Hau-Shan Wu
論文名稱: 針對固態硬碟所設計的消除重複資料存取架構
A DATA DE-DUPLICATION ACCESS FRAMEWORK FOR SOLID STATE DRIVES
指導教授: 吳晉賢
Chin-Hsien Wu
口試委員: 阮聖彰
Shanq-Jang Ruan
林昌鴻
Chang Hong Lin
陳維美
Wei-Mei Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 45
中文關鍵詞: 快閃記憶體固態硬碟重復資料存取架構
外文關鍵詞: Flash memory, Solid state drive, Duplicate, Framework
相關次數: 點閱:497下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著固態硬碟容量快速成長,許多在傳統硬碟上的應用已經被固態硬碟所取代。固態硬碟是由許多NAND快閃記憶體所組成,因為NAND快閃記憶體是一個對寫入指令敏感的記憶體,這個問題同時發生在固態硬碟中。因為快閃記憶體的非本地更新特性,大量的寫入指令會造成垃圾收集機制來回收記憶體中無效的頁面,頻繁的運行垃圾收集機制會降低使用壽命且降低整體效能。當固態硬碟使用在大量資料存取系統上,如何有效的減少資料寫入量是一個重要的課題。在這個研究中,我們提出一種針對固態硬碟所設計的消除重複資料存取架構,這個研究的目的為盡可能的消除重復資料來減少資料寫入量。我們會整合檔案層級的消除重複跟消除相似性重複來達到完整的資料消除重複機制,並且會運用重複資料的應用程式區域性以及檔案名稱區域性來增加重複資料的尋找準確性。根據實驗結果,我們的消除重複資料存取架構能夠有效的找出重複資料進而減少大量的資料寫入量,並且不會造成太多的系統效能耗損。


    With the rapid development of SSDs (Solid State Drives), traditional
    hard drives in many applications have been replaced by SSDs. Since
    SSDs consist of NAND flash memory, the main challenge to SSDs is
    that NAND flash memory is highly sensitive to write requests. A lot
    of write requests will cause garbage collection to reclaim free
    space due to the "out-place update" characteristic of flash memory.
    Frequent activities of garbage collection will reduce the lifetime
    of flash memory and overall performance. When SSDs are used for data
    storage, how to significantly decrease the amount of data written
    will become an important topic. In the thesis, we will propose a data
    de-duplication access framework for SSDs. The objective is to
    eliminate duplicate data as much as possible and reduce space
    consumption. We will combine file-based de-duplication and static
    chunking de-duplication schemes to reach a complete data
    de-duplication. We will also investigate application-based locality
    and file name locality to find out duplicate data. According to the
    experimental results, the proposed framework can efficiently
    identify duplicate data and decrease a lot of data written, and at
    the same time, the overhead is also reasonable.

    Abstract 1 Introduction 1 2 Duplicate data Characteristics 4 2.1 Application-based locality . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 File name locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Duplicate data Characteristics 7 3.1 Duplicate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Data De-Duplication Algorithm . . . . . . . . . . . . . . . . . . . . . . 9 3.3 Fingerprints Hash Function . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Chunk Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.1 Static Chunking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.2 Variable Chunking . . . . . . . . . . . . . . . . . . . . . . . .. . . 12 4 Data De-duplication Access Framework 13 4.1 Meta Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 4.2 Eliminate Duplicate Data . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Eliminate Near-Duplicate Data . . . . . . . . . . . . . . . . . . . . . 18 4.3.1 Chunk Fingerprint Table . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.2 Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 19 4.3.3 Identification of Near-Duplicate Data . . . . . . . . . . . . . . . . . 19 4.3.4 Merge Operation . . . . . . . . . . . . . . . . . . . . . . . . .. . . 22 4.4 Reference Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Evaluation 25 5.1 Rewrite Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 Write-Response-Time . . . . . . . . . . . . . . . . . . . . . . . . . . .27 5.3 Data Byte Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.4 File Write Performance . . . . . . . . . . . . . . . . . . . . . . . . . .30 6 Conclusion 32 Bibliography 33

    [1] Colossus: Ocz's 1tb solid state drive expected in stores this month.
    http://www.gizmag.com/ocz-colossus-1tb-ssd/12399/.
    [2] Flash memory. http://en.wikipedia.org/wiki/Flash memory.
    [3] Benjie Chen Athicha Muthitacharoen and David Mazieres. A low-bandwidth
    network file system. ACM Symposium on Operating Systems Principles archive
    Proceedings of the eighteenth ACM symposium on Operating systems principles,
    22:174-187, 2001.
    [4] Seung Ho Lima and Kyu Ho Park b. Deffs duplicate data elimination for flash
    memory file systems. In Korea Advanced Institute of Science and Technology,
    2009.
    [5] Microsoft. Single instance storage in microsoft windows storage server 2003 r2.
    In Technical White Paper, 2006.
    [6] Richard M. Karp and Michael O. Rabin. E±cient randomized pattern-matching
    algorithms. IBM Journal of Research and Development, 31:249-260, 1987.
    [7] Michael O. Rabin. Fingerprinting by random polynomials. IBM Journal of
    Research and Development, 31:249-260, 1987.
    [8] National Institute of Standards and Technology. Secure hash standard. Federal Information Processing Standards Publication, 1995.
    [9] Andre Brinkmann. Data deduplication. In Theoretical Aspects of Storage Sys-
    tems, 2009.
    [10] Bo Hong and Darrell D. E. Long. Duplicate data elimination in a san file system. In Proceedings of the 21st IEEE (MSST), 2004.
    [11] Deepak R. Bobbarjung, Suresh Jagannathan, and Cezary Dubnicki. Improving
    duplicate elimination in storage systems. ACM Transactions on Storage, l.2:424-
    448, 2006.
    [12] J. et al Kubiatowicz. Oceanstore: An architecture for global store persistent storage. In In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, 2000.
    [13] Robert Love. Linux Kernel Development. Novell Press, 2005.
    [14] Technical and installation information on io profile.

    無法下載圖示 全文公開日期 2015/07/28 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE