研究生: 楊明憲
Ming-hsien Yang
論文名稱: 高效能異質性Hadoop架構
High-Performance Heterogeneous Hadoop Architecture
指導教授: 徐勝均
Sendren Sheng-Dong Xu
口試委員: 陳佳堃
Jia-kun Chen
Jin-shyan Lee
學位類別: 碩士
系所名稱: 工程學院 - 自動化及控制研究所
Graduate Institute of Automation and Control
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 93
中文關鍵詞: HadoopHDFSMapReduce
外文關鍵詞: Hadoop, HDFS, MapReduce
相關次數: 點閱:943下載:0
Native Hadoop is a two-layered structure composed of one master and many slaves. Therein slave can be seen as the combination of DataNode and TaskTracker, while master is in charge of managing slave nodes. Since users may add more slave nodes in the Hadoop to increase the efficiency of parallel computing of massive data, Hadoop has been viewed as the key technology of massive data processing. Although Hadoop can promote the efficiency, the Hadoop cluster, composed of a large number of slaves, makes the real structure larger and consumes more energy. Therefore, this study combines ARM and x86 to form the new “Heterogeneously Three-Layered” Hadoop structure, based on ARM's characteristics which owns the characteristics: energy saving, high performance of massive data processing, and small space. Moreover, the concept of “Dynamically Managing Block Algorithm” is introduced to the task scheduler. This design not only can improve shortcomings in native Hadoop but also can effectively reduce more than 22% Map/Reduce operation time.

致謝 ………………………………………………………………………….I 中文摘要 …………………………………….……..………….……………II 目錄 ……………………………………………..…………….……………IV 圖目錄 ……………………………………...…………………..…………..VI 表目錄 …………………………………………………..……………….VIII 第一章 序論 1 1.1研究背景與動機 1 1.2本文架構 2 第二章 技術及理論探討 3 2.1 雲端運算 3 2.1.1雲端設備服務(IAAS) 3 2.1.2雲端軟體服務(SAAS) 5 2.1.3雲端平台服務(PAAS) 6 2.2 HADOOP 9 2.2.1 HDFS 10 2.2.2 MapReduce 11 2.3 HDFS檔案讀取分析 13 2.3.1 FSDATAINPUTSTREAM物件的創建 14 2.3.2 FSDATAINPUTSTREAM進行檔案讀取 16 2.4 HDFS檔案寫入分析 21 2.4.1 FSDataOutputStream物件的創建 22 2.4.2FSDataOutputStream進行檔案寫入 25 第三章 文獻探討與研究方式 32 3.1文獻探討 32 3.2異質型HADOOP架構 37 3.2.1將4顆ARM來取代1顆x86 37 3.2.2三層式架構 42 3.3 HDFS原始碼修改 45 3.4MAPREDUCE動態任務分配 49 3.5動態分配檔案區塊演算法 54 3.6 MAPREDUCE原始碼修改 58 第四章 實驗 62 4.1硬體實驗環境與HADOOP參數設定 62 4.2 MAPREDUCE程式範例 65 4.3 WORDCOUNT.JAVA輸入資料樣式 66 4.4實驗結果 67 4.4.1原生架構與三層式架構 68 4.4.2 HDFS集中儲存與分散儲存 70 4.4.3新排程演算法 72 第五章 結論及未來展望 74 參考文獻 75

