簡易檢索 / 詳目顯示

研究生: 陳國龍
Kuo-Long Chen
論文名稱: 基於重複資料偵測的行動裝置之可調式儲存系統
A Deduplication-aware Scalable Storage System for Mobile Devices
指導教授: 吳晉賢
Chin-Hsien Wu 
口試委員: 沈中安
Chung-An Shen
林淵翔
Yuan-Hsiang Lin
林昌鴻
Chang Hong Lin
陳維美
Wei-Mei Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2014
畢業學年度: 103
語文別: 中文
論文頁數: 45
中文關鍵詞: 重複資料偵測冷熱資料過濾器android儲存系統
外文關鍵詞: de-duplicaiton, hot-cold file filter, android, storage system
相關次數: 點閱:193下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資訊爆炸的時代, 資料的增長速度是非常快速的,人們會利用不同的裝置
    來獲取我們所需要的資訊,而行動裝置更是人們獲取資訊的一個重要媒介,但每
    個行動裝置都有著硬體上的限制例如:CPU、儲存空間、續電力。尤其是當儲存空
    間不夠用時,會透過Google Drive, iCloud, Dropbox... 等所提供的雲端空間來存放
    資料,但行動裝置上還是儲存了許多不常使用的資料或者重複的資料,所以在此
    我們想透過一些方法,讓行動裝置的資源能更有效的利用,所以在此提出了一個
    架構稱為: A Deduplication-aware Scalable Storage System for Mobile Device,並透過
    這樣的架構來完成資料有效分配、空間可擴張性,相較以往的雲端空間,我們提
    供了冷熱資料分析器(hotness filter) 來讓資料分類為較少使用的資料(冷資料) 與經
    常使用的資料(熱資料),並搭配重複資料偵測技術來減少上傳的冷資料時的傳輸
    量。最後根據實驗的結果,我們提供了精準的冷熱資料判定和提供行動裝置額外
    的使用空間並有效的減少行動裝置上傳所需的傳輸量。


    Nowadays, the capacity of data growth is very rapid because of convenient and fast
    network transmission and large-capacity storage systems. In particular, mobile devices
    are very popular devices to delivery and store information for people. However, mobile
    devices have hardware limitations such as slow CPU speed, small storage space, and low
    battery capacity. Especially, when the storage space is not enough, cloud storage systems
    such as Google Drive, iCloud, and Dropbox can provide extra storage space to store data.
    However, the mobile device could still store infrequently used data or duplication data.
    In this thesis, we propose a deduplication-aware scalable storage system for mobile devices.
    We use a hotness filter to identify the infrequently used data and transfer the data
    to the cloud storage. We also design a deduplication mechanism for mobile devices to
    decrease the transfer data. According to the experimental results, the proposed method
    can efficiently and correctly identify infrequently used data and transfer the data to the
    cloud storage to generate extra free space for mobile devices.

    中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV 目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V 圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII 表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Relative Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 冷熱資料分析器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Bloom Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Multi-Bloom Filter . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 HotDataTrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 重複資料偵測. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Fingerprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 A Deduplication-aware Scalable Storage System for Mobile Devices . . . . . . 14 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Hotness filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.1 Access Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.2 Hot-List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.1 File Type Detection . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3.2 Hybrid Chunking . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.3 Dynamic Fingerprint Lookup . . . . . . . . . . . . . . . . . . . 27 5 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1 建置實驗環境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2.1 冷熱門檔案分析器效能分析. . . . . . . . . . . . . . . . . . . 29 5.2.2 熱資料正確性分析. . . . . . . . . . . . . . . . . . . . . . . . 33 5.2.3 冷資料上傳效能分析. . . . . . . . . . . . . . . . . . . . . . . 38 6 結論與後續工作. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 授權書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    [1] Android developer, http://developer.android.com/index.html
    [2] Zen-Wei Hong Chin-Hsien Wu, Wen-Yen Chang. A reliable non-volatile memory
    system: Exploiting file-system characteristics. In 15th IEEE Pacific Rim International
    Symposium on Dependable Computing, Pacific Rim International Symposium
    on Dependable Computing ’09, pages 202–207, 2009.
    [3] Sang-Won Lee, Dong-Joo Park, Tae-Sun Chung, Dong-Ho Lee, Sangwon Park, and
    Ha-Joo Song. A log buffer-based flash translation layer using fully-associative sector
    translation. ACM Transactions on Embedded Computing Systems, 6(3), July 2007.
    [4] Jeong uk Kang, Heeseung Jo, Jin soo Kim, and Joonwon Lee. A superblock-based
    flash translation layer for nand flash memory. In In EMSOFT ’06: Proceedings of
    the 6th ACM & IEEE International conference on Embedded software, pages 161–
    170. ACM, 2006.
    [5] S.H. ; Sang Lyul Min ; Yookun Cho Jesung Kim, Jong Min Kim ; Noh. A spaceefficient
    flash translation layer for compactflash systems. In Consumer Electronics,
    IEEE Transactions on, pages 366–375. IEEE, 2002.
    [6] Dawoon Jung, Yoon-Hee Chae, Heeseung Jo, Jin-Soo Kim, and Joonwon Lee. A
    group-based wear-leveling algorithm for large-capacity flash memory storage systems.
    In Proceedings of the 2007 International Conference on Compilers, Architecture,
    and Synthesis for Embedded Systems, Compilers, architecture, and synthesis
    for embedded systems ’07, pages 160–164, New York, NY, USA, 2007. ACM.
    [7] Li-Pin Chang and Tei-Wei Kuo. Efficient management for large-scale flash-memory
    storage systems with resource conservation. ACM Transactions on Storage, 1(4):
    381–418, November 2005.
    [8] Hyojun Kim, Nitin Agrawal, and Cristian Ungureanu. Revisiting storage for smartphones.
    ACM Transactions on Storage, 8(4):14:1–14:25, December 2012.
    [9] Shimin Chen, Phillip B. Gibbons, and Suman Nath. Rethinking database algorithms
    for phase change memory. In CIDR’11: 5th Biennial Conference on Innovative Data
    Systems Research, 2011.
    [10] Li-Pin Chang. On efficient wear leveling for large-scale flash-memory storage systems.
    In Proceedings of the 2007 ACM Symposium on Applied Computing, pages
    1126–1130, New York, NY, USA, 2007. ACM.
    [11] Yuan-Hao Chang, Jen-Wei Hsieh, and Tei-Wei Kuo. Endurance enhancement of
    flash-memory storage systems: An efficient static wear leveling design. In Proceedings
    of the 44th Annual Design Automation Conference, pages 212–217, New York,
    NY, USA, 2007. ACM.
    [12] Taeho Kgil, David Roberts, and Trevor Mudge. Improving nand flash based disk
    caches. In Proceedings of the 35th Annual International Symposium on Computer
    Architecture, pages 327–338, Washington, DC, USA, 2008. IEEE Computer Society.
    [13] Jeffrey C. Mogul, Eduardo Argollo, Mehul Shah, and Paolo Faraboschi. Operating
    system support for nvm+dram hybrid main memory. In Proceedings of the 12th
    Conference on Hot Topics in Operating Systems, pages 14–14, Berkeley, CA, USA,
    2009. USENIX Association.
    [14] Li-Pin Chang. Hybrid solid-state disks: Combining heterogeneous nand flash in
    large ssds. In Proceedings of the 2008 Asia and South Pacific Design Automation
    Conference, pages 428–433, Los Alamitos, CA, USA, 2008. IEEE Computer Society
    Press.
    [15] Yongsoo Joo ; Yibo Chen ; Dimin Niu ; Yuan Xie ; Yiran Chen ; Hai Li Guangyu Sun.
    A hybrid solid-state storage architecture for the performance, energy consumption,
    and lifetime improvement. In High Performance Computer Architecture (HPCA),
    2010 IEEE 16th International Symposium on, 2010.
    [16] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications
    of the ACM, 13(7):422–426, July 1970.
    [17] Jen-Wei Hsieh, Tei-Wei Kuo, and Li-Pin Chang. Efficient identification of hot
    data for flash memory storage systems. ACM Transactions on Storage, 2(1):22–
    40, February 2006.
    [18] Dongchul Park and David H.C. Du. Hot data identification for flash-based storage
    systems using multiple bloom filters. In Mass Storage Systems and Technologies ,
    2011 IEEE 27th Symposium on, pages 1–11, May 2011.
    [19] Dongchul Park, Biplob Debnath, Youngjin Nam, David H. C. Du, Youngkyun Kim,
    and Youngchul Kim. Hotdatatrap: A sampling-based hot data identification scheme
    for flash memory. In Proceedings of the 27th Annual ACM Symposium on Applied
    Computing, pages 1610–1617, New York, NY, USA, 2012. ACM.
    [20] Michael O. Rabin. Fingerprinting by Random Polynomials. Center for Research in
    Computing Technology: Center for Research in Computing Technology. Center for
    Research in Computing Techn., Aiken Computation Laboratory, Univ., 1981.
    [21] Dirk Meister and Andre Brinkmann. Multi-level comparison of data deduplication
    in a backup scenario. In Proceedings of SYSTOR 2009: The Israeli Experimental
    Systems Conference, pages 8:1–8:12, New York, NY, USA, 2009. ACM.
    [22] Athicha Muthitacharoen, Benjie Chen, and David Mazieres. A low-bandwidth network
    file system. In Proceedings of the Eighteenth ACM Symposium on Operating
    Systems Principles, pages 174–187, New York, NY, USA, 2001. ACM.
    [23] Deepak R. Bobbarjung, Suresh Jagannathan, and Cezary Dubnicki. Improving duplicate
    elimination in storage systems. ACM Transactions on Storage, 2(4):424–448,
    November 2006.
    [24] K. ESHGHI. A framework for analyzing and improving content-based chunking
    algorithms. tech. rep. hpl-2005-30(r.1), hewlett packard laboratories, palo alto, 2005.
    [25] Athicha Muthitacharoen, Benjie Chen, and David Mazieres. A low-bandwidth network
    file system. SOSP ’01 Proceedings of the eighteenth ACM symposium on Operating
    systems principles, 35(5):174–187, October 2001.
    [26] Dirk Meister and Andre Brinkmann. Multi-level comparison of data deduplication
    in a backup scenario. In Proceedings of SYSTOR 2009: The Israeli Experimental
    Systems Conference, pages 8:1–8:12, New York, NY, USA, 2009. ACM.
    [27] Benjamin Zhu, Kai Li, and Hugo Patterson. Avoiding the disk bottleneck in the data
    domain deduplication file system. In In Proceedings of the 6th USENIX Conference
    on File And Storage Technologies, 2008.
    [28] Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise,
    and Peter Camble. Sparse indexing: Large scale, inline deduplication using sampling
    and locality. In Proccedings of the 7th Conference on File and Storage Technologies,
    pages 111–123, Berkeley, CA, USA, 2009. USENIX Association.
    [29] Deepavali Bhagwat, Kave Eshghi, Darrell D. E. Long, and Mark Lillibridge. Extreme
    binning: Scalable, parallel deduplication for chunk-based file backup. In In
    Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and
    Simulation of Computer and Telecommunication Systems (MASCOTS, 2009.
    [30] Biplob Debnath, Sudipta Sengupta, and Jin Li. Chunkstash: Speeding up inline storage
    deduplication using flash memory. In Proceedings of the 2010 USENIX Conference
    on USENIX Annual Technical Conference, pages 16–16, Berkeley, CA, USA,
    2010. USENIX Association.

    QR CODE