簡易檢索 / 詳目顯示

研究生: 陳千祥
Chien-hsiang Chen
論文名稱: 雲端異質資料庫移轉之可適性研究
Study on Scalable Approach in Database Conversion to Cloud Computing
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 林華君
Hwa-Chun Lin
呂學坤
Shyue-Kung Lu
蔡榮宗
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 56
中文關鍵詞: HadoopDB雲端資料庫Amazon EC2雲端運算
外文關鍵詞: HadoopDB, Cloud Database, Amazon EC2, Cloud Computing
相關次數: 點閱:223下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資訊及網路技術的快速發展,並且使用網路的人數亦逐年增加,上網的設備也不再侷限於個人電腦,因此常常出現Web應用系統無法負荷龐大的使用流量,而產生無法正常服務的現象。為了讓Web應用系統能正常的提供服務,企業或政府機關常常需要擴充硬體設備及網路頻寬以滿足尖峰時段的資源需求。隨著時間的累積,系統儲存的資料量也大幅增加,當應用程式在查詢及處理海量資料時,往往要花費許多時間,因此整個系統的效率也跟著下降。

    近幾年雲端運算技術日漸受到大家的重視,因為雲端運算透過虛擬化的技術,讓運算及儲存資源可以彈性的使用,並且可以有效率的處理及儲存海量資料。目前雲端資料庫大致可以分為兩類,一類稱為NoSQL資料庫,它是以Key-value的方式儲存資料,優點是資料的讀寫較快速,另一類則是雲端關聯式資料庫,有別於傳統關聯式資料庫,它可以採分散式處理及儲存資料,並且可以彈性的快速擴充,以解決資料庫寫入暴量的問題。

    當Web應用系統移植到雲端時,最大的改變將於資料庫,若選擇NoSQL資料庫,原本的資料庫系統資料結構將需重新設計,才有辦法將資料移轉至NoSQL資料庫,所以選擇像HadoopDB這一類的雲端關聯式資料庫系統,在資料結構上將不用改變。

    本研究使用Amazon EC2雲端環境架設HadoopDB,並且設計Trans2Cloud資料移轉系統執行資料移轉工作。Trans2Cloud利用Hadoop分散式處理資料移轉、資料驗證等工作,此系統支援所有相容JDBC的資料庫。本研究完成分析HadoopDB搭配不同的關聯式資料庫時,其資料移轉及查詢的效能。經過分析發現HadoopDB搭配MySQL時,執行100萬筆資料新增只需118秒,而搭配PostgreSQL卻花費了345秒,因此MySQL較適合用於新增資料較頻繁的系統,不過在執行100萬筆的資料驗證中發現,HadoopDB搭配PostgreSQL執行資料驗證花費了378秒,而搭配MySQL則是花費421秒,因此PostgreSQL適合用於查詢資料較頻繁的系統,另外當資料量大於50萬筆以上改用HadoopDB其查詢效能才能有較明顯的改善。


    One of obvious impacts upon rapid developments of information and communication technologies is the growth and large-scale physical server deployment. Web application takes place in utilizing distributed service implementation, where traditionally it implemented in physical server. In order to maintain satisfying service, the web service provider should expand the server storage capacity. However, the massive data stored in the database will continually increase, then resulting on negative impact for the web application system performance.

    Cloud Computing has attracted great of attention for its virtualization technology in handling massive data as well as its efficiency and flexibility for managing computing task and storage distribution. There are two types of cloud database, first is NoSQL database using Key-Value tables featuring its advantage in faster data reading and writing. The second is relational database as that performing distributed processing and storage.

    Most of significant aspect for Web application system migration to Cloud is database modification. If the NoSQL database is chosen then the original database structure need to be redesigned for NoSQL database format. HadoopDB is proposed for cloud relational database migration without necessary changing the data structure.

    This study builds the HadoopDB in the Amazon EC2 and design the Trans2Cloud system to execute the data migration task. Trans2Cloud distribute migration and verification data in the hadoop. Trans2Cloud supports any JDBC-compliant database server. This study analyzes data migration and query performance in different relational database with HadoopDB. After analysis HadoopDB with MySQL is suitable for insert frequent information systems. HadoopDB with PostgreSQL is suitable for query frequent information system. The HadoopDB query performance is better than original database when data more than 500,000 rows.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VI 表目錄 VIII 第一章 緒論 1 1.1 研究動機 1 1.2 研究貢獻 3 1.3 論文架構 3 第二章 相關技術及知識介紹 4 2.1 雲端運算簡介 4 2.2 雲端運算的虛擬化技術 6 2.2.1 Oracle VirtualBox 6 2.2.2 Xen 7 2.2.3 Amazon EC2 虛擬機器 8 2.3 雲端資料庫 10 2.3.1 BigTable 11 2.3.2 HadoopDB 12 2.4 雲端服務提供者 14 2.4.1 Amazon Web Service(AWS) 14 2.4.2 Google App Engine (GAE) 18 2.5 Apache Sqoop 20 第三章 Trans2Cloud設計分析與實作 21 3.1 Trans2Cloud運作說明 21 3.2 系統分析與設計 25 3.3 Trans2Cloud功能測試 29 第四章異質資料庫移轉及查詢效能分析 31 4.1 系統與環境建置 31 4.2 資料移轉及資料查詢效能分析 43 4.2.1 資料移轉前置工作及案例介紹 43 4.2.2 異質資料庫資料移轉效能分析 46 4.2.3 HadoopDB查詢效能分析 49 第五章 結論與未來工作 52 5.1 結論 52 5.2 未來工作 52 參考文獻 54

    [1] F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows,T. Chandra, A. Fikes and R. Gruber, “Bigtable: a distributed storage system for structured data,” In Proceedings of the 7th Conference on USENIX Symposium on Operating Systems Design and Implementation (OSDI), Seattle, Washington, November 2006, pp. 205-218.
    [2] Lakshman and P. Malik, “Cassandra: a decentralized structured storage system,” In Proceedings of the ACM SIGOPS Operating Systems Review, April 2010, pp.35-40.
    [3] HBase. Retrieved March 20, 2012, from http://hbase.apache.org/
    [4] Sciore, “SimpleDB: a simple java-based multiuser syst for teaching database internals,” In Proceedings of the 28th SIGCSE Technical Symposium on Computer Science Education, 2007, vol. 39, pp. 561-565.
    [5] Abouzeid, K. BajdaPawlikowski, D. Abadi, A. Silberschatz, A. Rasin, “HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads,” In Proceedings of the VLDB, Lyon, France, August 2009, pp.922–933.
    [6] P. Mell and T. Grance, “The NIST Definition of Cloud Computing”, NIST September 2011. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
    [7] Oracle VirtualBox. Retrieved March 16, 2012, from https://www.virtualbox.org/
    [8] Xen. Retrieved April 11, 2012, from http://xen.org/
    [9] A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff and R. Murthy, ”Hive-A Warehousing Solution Over a MapReduce Framework,” In Proceedings of the VLDB, Lyon, France, August 2009, pp.1626-1629.
    [10] B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver and R. Yerneni, ”PNUTS: Yahoo!’s Hosted Data Serving Platform,” In Proceedings of the VLDB, Auckland, New Zealand, August 2008, pp.1277-1288.
    [11] B. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, R. Sears, “Benchmarking Cloud Serving Systems with YCSB,” In proceedings of ACM symposium on Cloud Computing, Indianapolis, Indiana, June 2010, pp.143-154.
    [12] Y. Shi, X. Meng, J. Zhao, X. Hu, B. Liu, and H. Wang, ”Benchmarking Cloud-based Data Management Systems,” In Proceedings of the second international worksop on Cloud data management,Toronto, Ontario, Canada, October 2010, pp.47-54.
    [13] V. Mateljan, D. Cisic and D. Ogrizovic, ”Cloud Database as a Service(DaaS)-ROI,” In Proceedings of the 33rd MIPRO international convention, May 2010, pp.1185-1188.
    [14] A. Abouzied, K. Bajda-Pawlikowski, J. Huang, D. J. Abadi, A. Silberschatz, ”HadoopDB in Action: Building Real World Applications,” In Proceedings of the SIGMOD international conference on management of data, June 2010, pp.1111-1114.
    [15] Apache Sqoop. Retrieved April 12, 2012, from http://sqoop.apache.org/
    [16] Apache Hadoop. Retrieved March 9, 2012, from http://hadoop.apache.org/
    [17] S. Krishnan, J. C. Counio, “Pepper: An Elastic Web Server Farm for Cloud based on Hadoop,” In Proceedings of IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), November 2010, pp.741-747.
    [18] D. P. Pham, C. F. Lin, E. Jou, “DATABASE BACKED BY CLOUD STORE FOR ON-PREMISE APPLICATIONS,” In Proceedings of IEEE 13th International Conference on High Performance Computing and Communications(HPCC), September 2011, pp.708-713.
    [19] K. B. Pawlikowski, D. J. Abadi, A. Silberschatz, E. Paulson, “Efficient Processing of Data Warehousing Queries in a Split Execution Environment,” In Proceedings of the SIGMOD international conference on management of data, June 2011, pp.1165-1176.
    [20] R. Gupta, H. Gupta, U. Nambiar, M. Mohania, “Efficiently Querying Archived Data Using Hadoop,” In Proceedings of ACM 19th international conference on information and knowledge management, October 2010, pp.1301-1304.
    [21] S. Sathya and M. Victor Jose, “Application of Hadoop MapReduce technique to Virtual Database system design,” In Proceedings of IEEE Conference on Emerging Trends in Electrical and Computer Technology, March 2011, pp.892-896.
    [22] M. R. Martinez, J. Seguel, M. Greer, “Open Source Cloud Computing Tools: A Case Study with a Weather Application,” In Proceddings of IEEE 3rd Conference on Cloud Computing, July 2010, pp.443-449.
    [23] M. Wallis, F. Henskens, M. Hannaford, “A Distributed Content Storage Model for Web Applications,” In Proceddings of IEEE Second International Conference on Evolving Internet, September 2010, pp.98-106.
    [24] M. J. Hsieh, C. R. Chang, L. Y. Ho, J. J. Wu, P. Liu, “SQLMR : A Scalable Database Management System for Cloud Computing,” In Proceddings of IEEE International Conference on Parallel Processing, September 2011, pp.315-324.
    [25] S. Rozsnyai, A. Slominski, Y. Doganata, “Large-Scale Distributed Storage System for Business Provenace,” In Proceddings of IEEE International Conference on Cloud Computing, July 2011, pp.516-524.
    [26] I. Konstantinou, E. Angelou, C. Boumpouka, D. Tsoumakos, N. Koziris, “On the Elasticity of NoSQL Databases over Cloud Management Platforms,” In Proceddings of ACM 20th International Conference on Information and Knowledge Management, October 2011, pp.2385-2388.
    [27] C. Li, “Trnasforming Relational Database into HBase: ACase Study,” In Proceddings of IEEE International Conference on Software Engineering and Service Sciences, July 2010, pp.683-687.
    [28] J. Zhao, X. Hu, X. Meng, “ESQP: An Efficient SQL Query Processing for Cloud Data Management,” In Proceddings of ACM Second International Workshop on Cloud Data Management, October 2010, pp.1-8.
    [29] Z. Wei, G. Pierre, C. H. Chi, “CloudTPS: Scalable Transactions for Web Applications in the Cloud,” In Procedddings of IEEE Transactions on Services Computing, April 2011, pp.1-16
    [30] S. Ramanathan, S. Goel, S. Alagumalai, “Comparsion of Cloud Database: Amazon’s SimpleDB and Google’s Bigtalbe,” In Proceddings of IEEE International Conference on Recent Trends in Information Systems, December 2011, pp.165-168.
    [31] M. Zhu, T. Risch, “Querying Combined Cloud-Based and Relational Database,” In Proceddings of IEEE International Conference on Cloud and Service Computing, December 2011, pp.330-335.
    [32] M. An, Y. Wang, W. Wang, N. Sun, “Integrating DBMSs as a Read-Only Execution Layer into Hadoop,” In Proceddings of IEEE International Conference on Parallel and Distributed Computing, Applications and Technologies, December 2010, pp.17-26.
    [33] M. Bhandarkar, “MapReduce Programming with Apache Hadoop,” In Proceddings of IEEE International Symposium on Parallel & Distributed Processing, April 2010, pp.1
    [34] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares, X. Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters,” In Proccedings of IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, April 2010, pp.1-9.
    [35] J. Shafer, S. Rixner, A. L. Cox, “The Hadoop Distributed Filesystem: Blancing Portability and Performance,” In Proccedings of IEEE International Symposium on Performance Analysis of Systems & Software, March 2010, pp.122-133.

    QR CODE