簡易檢索 / 詳目顯示

研究生: 俞詠善
Yun-sun Yee
論文名稱: Performance Study of Map-Reduce over Hadoop of Large-scale Data Processing
Performance Study of Map-Reduce over Hadoop of Large-scale Data Processing
指導教授: 呂政修
Jenq-Shiou Leu
口試委員: 鄭瑞光
Ray-Guang Cheng
石維寬
Wei-Kuan Shih
陳省隆
Hsing-Lung Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 45
中文關鍵詞: 雲端計算Map-ReduceSQLHadoop
外文關鍵詞: Hadoop, SQL, Map-Reduce, Cloud-Computing
相關次數: 點閱:333下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來,“雲端計算”這個專業領域不斷普及化。有許多著名的公司行號如:奇摩,谷歌 (世界知名搜索引擎)等不斷嘗試推出相關的服務給工商業者,甚至擴展至一般使用者的階層。基於SQL的技術,從而落實可實施大規模數據處理的程式模組 - Map-Reduce,也經由被廣泛的討論與研究而成爲一個熱門的議題。多個現實生活中的作業流程,例如:搜索引擎中的資料處理均可透過一個包含"Map(映射)"和"Reduce(化簡)"這兩個功能的簡單界面平行化執行。在此論文研究中,我們透過多種實驗,將由Map-Reduce所執行之 Hadoop系統與一般的關聯式資料庫–SQL Server做一系列的效能評估測試。我們發現Hadoop處理查詢所花費的完成時間,較SQL Server處理相同查詢所花費的完成時間短。在另一方面,部分相關的參數設定均被測試,以評估該部分相關的參數設定是否對Hadoop的效能有所影響。我們也查明加入多台機器參與資料處理,能提升Hadoop的效能,尤其是在大型資料量處理之下更爲明顯。


Popularity for the term ‘Cloud-Computing’ has been increasing in recent years. There are many great companies such as Yahoo, Google etc. tried to provide related services to business community, even through public users. In addition to the SQL technique, Map-Reduce, a programming model that realizes implementing large-scale data processing, has been a hot topic that is widely discussed through many studies. Many real-world tasks such as data processing for search engines can be parallel-implemented through a simple interface with two functions called Map and Reduce. In this paper, we focus on comparing the performance of the Hadoop implementation of Map-Reduce with SQL Server through simulations. In our studies, Hadoop can complete the same query faster than SQL Server. On the other hand, some concerned factors are also tested to see whether they would affect the performance for Hadoop or not. We also find that more machines included for data processing can make Hadoop achieve a better performance, especially for a large-scale data set.

論文摘要 Abstract 誌謝 Figure Index Table Index Chapter 1: Introduction Chapter 2: Related Works Chapter 3: Operation Scheme 3.1 Relational Database Model 3.2 Map-Reduce Model 3.2.1 Block size 3.2.2 Fair Scheduler Chapter 4: Proposed Mechanism Chapter 5: Experimental Evaluation Chapter 6: Conclusion and Future Work References

[1] R. Maggiani, "Cloud computing is changing how we communicate," in IEEE International Professional Communication Conference 2009, IPCC 2009, Waikiki, HI, United states, 2009.
[2] W. Jiang, V. T. Ravi, and G. Agrawal, "Comparing map-reduce and FREERIDE for data-intensive applications," in IEEE International Conference on Cluster Computing and Workshops 2009. CLUSTER '09, pp. 1-10.
[3] N. Sultan, "Cloud computing for education: A new dawn?," International Journal of Information Management, vol. 30, pp. 109-116.
[4] I. Shadi, J. Hai, L. Lu, L. Qi, S. Wu, and X.-H. Shi, "Evaluating MapReduce on Virtual Machines: The Hadoop Case," in Proceedings of the 1st International Conference on Cloud Computing Beijing, China: Springer-Verlag, 2009.
[5] L.-Q. Li, "An optimistic differentiated service job scheduling system for Cloud Computing service users and providers," Qingdao, China, 2009, pp. 295-299.
[6] C. Alexandre di, A. Marcos Dias de, and B. Rajkumar, "Harnessing Cloud Technologies for a Virtualized Distributed Computing Infrastructure," IEEE Internet Computing, vol. 13, pp. 24-33, 2009.
[7] D. Jeffrey and G. Sanjay, "MapReduce: simplified data processing on large clusters," Commun. ACM, vol. 51, pp. 107-113, 2008.
[8] J. L. Johnson, "SQL in the Clouds," Computing in Science & Engineering, vol. 11, pp. 12-28, 2009.
[9] G. Mackey, S. Sehrish, J. Bent, J. Lopez, S. Habib, and J. Wang, "Introducing map-reduce to high end computing," in Petascale Data Storage Workshop, 2008. PDSW '08. 3rd, 2008, pp. 1-6.
[10] T. Xia, "Large-Scale SMS Messages Mining Based on Map-Reduce," in International Symposium on Computational Intelligence and Design 2008, ISCID '08, pp. 7-12.
[11] S. Sadhasivam, N. Nagaveni, R. Jayarani, and R. V. Ram, "Design and Implementation of an Efficient Two-level Scheduler for Cloud Computing Environment," in International Conference on Advances in Recent Technologies in Communication and Computing 2009, ARTCom '09, pp. 884-886.
[12] S. Sadhasivam and A. M, "Design and Implementation of a Two Level Scheduler For Hadoop Data Grids," International Journal of Advanced Networking and Applications 2010. IJANA '10, pp. 296-301, April 08, 2010.
[13] J.-S. Leu, "A lightweight brokering system for content/service charging in a cellular network centric business model," Comput. Commun., vol. 31, pp. 2078-2085, 2008.
[14] "SQL Server Developer Center," http://msdn.microsoft.com/en-us/sqlserver/default.aspx.
[15] H.-C. Yang, D. Ali, R.-L. Hsiao, and D. S. Parker, "Map-reduce-merge: simplified relational data processing on large clusters," in Proceedings of the 2007 ACM SIGMOD international conference on Management of data Beijing, China: ACM, 2007.
[16] "Apache, Welcome to Hadoop!," http://hadoop.apache.org/, 2009.
[17] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, "Bigtable: A Distributed Storage System for Structured Data," OSDI'06: Seventh Symposium on Operating System Design and Implementation,Seattle, WA, 2006.
[18] "Hadoop Fair Scheduler Design Document," http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/src/contrib/fairscheduler/designdoc/fair_scheduler_design_doc.pdf, 2009.
[19] D. C. Montgomery and E. A. Peck, Introduction to Linear Regression analysis, 2nd Edition. New York: Wiley, 1992.
[20] Y.-C. Tsay, "Application of Java on Statistics Education," Department of Applied Mathematics, National Sun Yat-Sen University, Kaohsiung, Taiwan, July 2000.

QR CODE