簡易檢索 / 詳目顯示

研究生: 黃正杰
Cheng-Chieh Huang
論文名稱: 針對Cassandra資料庫之混合型儲存式系統
A Hybrid Storage System For Cassandra Databases
指導教授: 吳晋賢
Chin-Hsien Wu
口試委員: 阮聖彰
Shanq-Jang Ruan
陳維美
Wei-Mei Chen
吳晋賢
Chin-Hsien Wu
陳雅淑
Ya-Shu Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 45
中文關鍵詞: 資料庫混合型儲存式系統固態硬碟
外文關鍵詞: Cassandra, Hybrid Storage, database
相關次數: 點閱:162下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 相對於傳統硬碟,固態硬碟能夠提供更好的I/O效能,尤其在應用程式需要大量I/O效能時,我們可以將相對常用的資料搬至固態硬碟,以提升系統整體效能,但是考慮到在價格上固態硬碟比傳統硬碟貴,我們需要在搬移資料上以最低的成本達成最高的效能提升,因此我們設計一個針對Cassandra的儲存式系統,熱資料能夠有效的在適當時間搬移至固態硬碟,當資料很久沒被使用時,也會被搬移至傳統硬碟。
    在本篇論文中,我們提出一個藉由更改Cassandra非關聯式資料庫系統架構,將常用的資料放至固態硬碟,不常用的資料放至傳統硬碟,利用固態硬碟高速存取的特性提升效能,利用傳統硬碟價格低的優點降低硬體成本。我們將在第五章節以實驗來說明我們所提出的方法可以達到此目的。


    Compared with traditional hard disks (HDD), solid-state drive (SSD) can offer more I/O efficiency, especially when applications need high I/O performance. We can improve I/O efficiency by moving frequently-used data to SSD. However, SSD is more expensive then HDD and we want to use limited SSD space to get reasonable I/O performance. To achieve this goal, we design a hybrid storage system for Cassandra databases. In this thesis, we modify a NoSQL Cassandra database by implementing a hybrid storage system. We place high-priority data in SSD with fast access property and put low-priority data in HDD with low cost. The experimental results also show that our proposed method can achieve the goal.

    第一章 緒論 1 1.1 前言 1 1.2 論文架構 2 第二章 環境背景 4 2.1 資料配置的方法(Data Placement Method) 4 2.2 熱資料的判定(Hot Data Identification) 6 2.4 混合型儲存裝置(Hybrid Storage System) 10 2.5 非關聯式資料庫 12 2.6 Apache Cassandra 14 第三章 研究動機與相關研究 18 第四章 基於優先權決策的資料搬移設計 21 4.1 系統概述 21 4.2 物件的放置(Placement of Objects) 24 4.2.1 物件(Object) 24 4.2.2 緩存管理員(Buffer Manager) 25 4.2.3 從固態硬碟到傳統硬碟的搬移 25 4.2.4 從傳統硬碟到固到硬碟的搬移 30 4.2.5 權重值的決定 31 第五章 實驗與效能分析 33 5.1 概述 33 5.2 Cassandra-Stress 34 5.3 效能分析 36 5.3.1 實驗概述 36 5.3.2 經過修改後Cassandra之系統效能 36 5.3.3 使用Buffer Manager之系統效能 37 5.3.4 當讀取與寫入時接使用MBF之系統效能 39 5.3.5 當擴充固態硬碟容量後之系統效能 39 第六章 結論 41

    [1] S. Cho, S. Chang, and I. Jo, "The Solid-State Drive Technology, Today and Tomorrow," in IEEE 31st International Conference on Data Engineering, pp. 1520-1522, April 2015.
    [2] M. Canim, G. A. Mihaila, B. Bhattacharjee, K. A. Ross, and C. A. Lang, "An Object Placement Advisor for DB2 Using Solid State Storage," in Proceedings of the VLDB Endowment, vol. 2, pp. 1318–1329, Aug 2009.
    [3] J. Schindler, A. Ailamaki, and G. R. Ganger, "Matching Database Access Patterns to Storage Characteristics," in In FAST ’02: Proceedings of the 1st USENIX Conference on Fileand Storage Technologies, page 22, 2002.
    [4] Oracle, "Take the Guesswork Out of Database Layout and I/O Tuning with Automatic Storage Management," in Oracle Technical White Paper, December 2005.
    [5] A. Sachedina, M. Huras, and A. Colangelo, "Best Practices Database," in White paper, IBM DB2 for Linux, UNIX, and Windows, Oct 2008.
    [6] M. Canim, G. A. Mihaila, B. Bhattacharjee, K. A. Ross, and C. A. Lang, "SSD Bufferpool Extensions for Database Systems," in Proceedings of the VLDB Endowment, vol. 3, pp. 1435–1446, September 2010.
    [7] C.-K. Kang, Y.-J. Cai, C.-H. Wu, and P.-C. Hsiu, "A Hybrid Storage Access Framework for High-Performance Virtual Machines," in ACM Transactions on Embedded Computing Systems (TECS), vol. 13, no. 5s, November 2014.
    [8] H. Shi, R. V. Arumugam, C. H. Foh, and K. K. Khaing, "Optimal Disk Storage Allocation for Multitier Storage System," in IEEE Transactions on Magnetics, vol. 49, no. 6, June 2013.
    [9] L. Lin, Y. Zhu, J. Yue, Z. Cai, and B. Segee, "Hot Random Off-Loading: A Hybrid Storage System with Dynamic Data Migration," in 19th Annual IEEE International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 318–325, July 2011.
    [10] J. Ou, J. Shu, Y. Lu, L. Yi, , and W. Wang, "EDM: An Endurance-Aware Data Migration Scheme for Load Balancing in SSD Storage Clusters," in IEEE International Parallel and Distributed Processing Symposium, pp. 787–796, May 2014.
    [11] J. Choi, B. Lee, D. Jung, and H. Y. Youn, "An SSD-Based Accelerator Using Partitioned Bloom Filter for Directory Parsing," in IEEE International Conference on IT Convergence and Security, pp. 1–5, August 2015.
    [12] D. Park and D. H. Du, "Hot Data Identification for Flash-based Storage Systems Using Multiple Bloom Filters," in Mass Storage Systems and Technologies (MSST), pp. 1–11, May 2011.
    [13] C.-H. Wu, P.-H. Wu, K.-L. Chen, W.-Y. Chang, and K.-C. Lai, "A Hotness Filter of Files for Reliable Non-Volatile Memory Systems," in IEEE Transactions on Dependable and Secure Computing, vol. 12, July 2015.
    [14] D. Park, "Hot and cold data identification: Applications to storage," in Ph.D. dissertation, University of Minnesota , 2012.
    [15] B. H. Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," in Communications of the ACM, vol. 13, no. 7, July 1970 .
    [16] T. Kgil, D. Roberts and T. Mudge, "Improving NAND Flash Based Disk Caches," in ACM ISCA, 2008.
    [17] S.F. Hsiao, P.-C. Hsiu and C. Hsiu and T.-W. Kuo, "A Reconfigurable Virtual Storage Device," in IEEE ISORC, 2009.
    [18] F. Chen, D. A. Koufaty and X. Zhang, "Hystor: Making the Best Use of Solid State Drives in High Performance Storage Systems," in ACM ICS, 2011.
    [19] Q. Yang and J. Ren, "I-CASH: Intelligently Coupled Array of SSD and HDD," in IEEE HPCA, 2011.
    [20] J. Bhogal and I. Choksi, "Handling Big Data using NoSQL," in 29th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 393-398, March 2015.
    [21] G. C. Deka, "A Survey of Cloud Database Systems," in IT Professional, vol. 16, no. 2, pp. 50-57, March/April 2014.
    [22] P. P. Srivastava, S. Goyal, and A. Kumar, "Analysis of Various NoSql Database," in International Conference on Green Computing and Internet of Things (ICGCIoT), pp. 539-544, October 2015.
    [23] Apache Labs, "Cassandra," [Online]. Available: http://cassandra.apache.org/.
    [24] J. Han, H. E, G. Le, and J. Du, "Survey on NoSQL Database," in 2011 6th International Conference on Pervasive Computing and Applications (ICPCA), pp. 363 - 366, October 2011.
    [25] A. Chebotko, A. Kashlev, and S. Lu, "A Big Data Modeling Methodology for Apache Cassandra," in 2015 IEEE International Congress on Big Data, pp. 238 - 245, 2015.
    [26] DataStax, Inc, "Apache Cassandra™ 2.1 Documentation," 2016. [Online]. Available: http://docs.datastax.com/en/cassandra/2.1/pdf/cassandra21.pdf.
    [27] P. Menon, T. Rabl, M. Sadoghi, and H.-A. Jacobsen, "CaSSanDra: An SSD Boosted Key-Value Store," in 2014 IEEE 30th International Conference on Data Engineering, pp. 1162 - 1167, 2014.
    [28] 張鎮宇, “在混合型儲存裝置基於優先權決策針對非關聯式資料庫系統之物件搬移設計,” 於 碩士論文, 台灣科技大學, 2016.
    [29] 黃政偉, “以優先權為基礎的資料管理方法針對使用固態硬碟的資料庫系統,” 於 碩士論文, 台灣科技大學, 2014.
    [30] J. Do, D. Zhang, J. M. Patel, D. J. DeWitt, J. F. Naughton, and A. Halverson, "Turbocharging DBMS Buffer Pool Using SSDs," in SIGMOD '11 Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1113-1124 , 2011.
    [31] B. F. Cooper, A. Siberstein, E. Tam, R. Ramakrishnan, and R. Sears, "Benchmarking Cloud Serving Systems with YCSB," in SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, pp. 143-154, June 2010.
    [32] M. Barata, J. Bernardino, and P. Furtado, "YCSB and TPC-H: Big Data and Decision Support Benchmarks," in IEEE International Congress on Big Data, pp. 800-801, 2014.

    QR CODE