簡易檢索 / 詳目顯示

研究生: 高玉璁
Yu-Chon Kao
論文名稱: 區域性感知的MapReduce即時排程架構
Data-Locality-Aware MapReduce Real-Time Scheduling Framework
指導教授: 陳雅淑
Ya-Shu Chen
口試委員: 吳晉賢
Chin-Hsien Wu
修丕承
Pi-Cheng Hsiu
謝仁偉
Jen-Wei Hsieh
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 27
中文關鍵詞: 即時排程區域性感知雲端運算MapReduce能源節省
外文關鍵詞: Rea-tiime scheduling, Data-locality, Cloud computing systems, MapReduce, Energy saving
相關次數: 點閱:237下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 交互式雲端應用的需求日益增加,其導致被運用在雲端應用的MapReduce框架即時性需求越來越被重視。有別於傳統應用程序,MapReduce為處理大數據,其具有數據區域性特徵且以不可搶占的方式執行。因此,MapReduce的實時調度問題必須要考慮非搶占執行造成的等待時間和未滿足數據區域性造成的數據傳輸時間。本論文提出了一種基於區域性感知的MapReduce即時調度框架,藉此保證交互式雲端應用程序的服務質量。本文提出的即時調度框架包括工作排程器、調度器與動態功耗管理,可縮短高優先權應用程序的等待時間、提升數據區域性的分配,並有效降低系統所需的能源消耗。本論文提出的方法藉由不同類型的工作負載進行評估,並與現存其他的調度演算法進行比較,顯示其可有效提高應用程序的可排程性與數據區域性。


    MapReduce is widely used in cloud applications for large-scale data processing. The increasing number of interactive cloud applications has led to an increasing need for MapReduce real-time scheduling. Most MapReduce applications are data-oriented and nonpreemptively executed. Therefore, the problem of MapReduce real-time scheduling is complicated because of the trade-off between run-time blocking for nonpreemptive execution and data-locality. This paper proposes a data-locality-aware MapReduce real-time scheduling framework for guaranteeing quality of service for interactive MapReduce applications. A scheduler and dispatcher that can be used for scheduling two-phase MapReduce jobs and for assigning jobs to computing resources are presented, and the dispatcher enable the consideration of blocking and data-locality. Furthermore, dynamic power management for run-time energy saving is discussed. Finally, the proposed methodology is evaluated by considering synthetic workloads, and a comparative study of different scheduling algorithms is conducted.

    Chapter 1 Introduction 2 Chapter 2 Related work 3 Chapter 2 System model 5 Chapter 4 Data-locality-aware mapreduce real-time scheduling 7 4.1 MapReduce Real-Time Scheduling Framework 7 4.2 Local Deadline Assignment 8 4.3 Data-Locality-Aware Task Partition 8 4.3.1 Job Dispatch 9 4.3.2 Map Task Partition 11 4.3.3 Reduce Task Partition 13 4.4 Schedulability Test 14 4.5 Example 16 4.6 Power Controller 18 Chapter 5 Performance evaluation 20 5.1 Data Sets and Performance Metrics 20 5.2 Experimental Results 21 5.2.1 Schedulability 21 5.2.2 Energy Efficiency 22 5.3 Case study: Facebook with Hadoop on CloudSimRT 24 Chapter 6 Conclusion 27

    [1] Apache. Mapreduce.
    [2] Youngseok Lee, Wonchul Kang, , and Hyeongu Son. An internet traffic analysis method with mapreduce. In Proceedings of the Network Operations and Management Symposium Workshops (NOMS Wksps), pages 357 – 361, 2010.
    [3] Songting Chen. Cheetah: a high performance, custom data warehouse on top of mapreduce. In Proceedings of the VLDB Endowment, pages 1459–1468, 2010.
    [4] Guojun Liu, Ming Zhang, and Fei Yan. Large-scale social network analysis based on mapreduce. In Proceedings of the Computational Aspects of Social Networks (CASoN), pages 487 – 490, 2010.
    [5] Apache. Apache hadoop.
    [6] Apache. Hadoop mapreduce next generation - capacity scheduler.
    [7] Apache. Fair scheduler.
    [8] Jorda Polo, Claris Castillo, David Carrera, Yolanda Becerra, Ian Whalley, Malgorzata Steinder, Jordi Torres, and Eduard Ayguade. Resource-aware adaptive scheduling for mapreduce clusters.In Proceedings of the ACM/IFIP/USENIX International Middleware Conference, pages 187–207, 2011.
    [9] Zhenhua Guo, Geoffrey Fox, Mo Zhou, and Yang Ruan. Improving resource utilization in mapreduce. In Proceedings of the IEEE International Conference on Cluster Computing, pages 402–410, 2012.
    [10] Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. Cloudscale: elastic resource scaling for multi-tenant cloud systems. In Proceedings of the ACM Symposium on Cloud Computing, page Article No. 5, 2011.
    [11] Weisong Hu, Chao Tian, Xiaowei Liu, and Hongwei Qi. Multiple-job optimization in mapreduce for heterogeneous workloads. In Proceedings of the Semantics Knowledge and Grid, pages 135– 140, 2010.
    [12] Jiong Xie, Auburn Univ, Shu Yin, Xiaojun Ruan, and Zhiyang Ding. Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In Proceedings of the IEEE Parallel and Distributed Processing, Workshops and Phd Forum, pages 1–9, 2010.
    [13] Balaji Palanisamy, Aameek Singh, Ling Liu, and Bhushan Jain. Purlieus: locality-aware resource allocation for mapreduce in a cloud. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2011.
    [14] Chia-Wei Leea, Kuang-Yu Hsieha, Sun-Yuan Hsieha, and Hung-Chang Hsiaoa. A dynamic data placement strategy for hadoop in heterogeneous environments. Big Data Research, 1:14–22, 2014.
    [15] Hsin-Wen Wei Tseng-Yi Chen and, Ming-Feng Wei, and Ying-Jie Chen. Lasa: A locality-aware scheduling algorithm for hadoop-mapreduce resource assignment. In Proceedings of the Collaboration Technologies and Systems, pages 342–346, 2013.
    [16] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenke, and Ion
    Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling.
    In Proceedings of the ACM european conference on Computer Systems, pages 265–278, 2010.
    [17] Matei Zaharia, Andy Konwinski, Anthony D. Joseph andRandy Katz, and Ion Stoica. Improving data locality of mapreduce by scheduling in homogeneous computing environments. In Proceedings of the IEEE Parallel and Distributed Processing with Applications (ISPA), pages 120–126, 2011.
    [18] Xiaohong Zhang, Yuhong Feng, Shengzhong Feng, Jianping Fan, and Zhong Ming. An effective data locality aware task scheduling method for mapreduce framework in heterogeneous environments. In Proceedings of the International Conference on Cloud and Service Computing, pages 235–242, 2011.
    [19] Jiahui Jin, Nanjing, Junzhou Luo, Aibo Song, and Fang Dong. Bar: An efficient data locality driven task scheduling algorithm for cloud computing. In Proceedings of the IEEE/ACM Cluster, Cloud and Grid Computing, pages 295–304, 2011.
    [20] Zhenhua Guo, G. Fox, andMo Zhou. Investigation of data locality inmapreduce. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 419–426, 2012.
    [21] M. Khan, Yang Liu, and Maozhen Li. Data locality in hadoop cluster systems. In Proceedings of the International Conference on Fuzzy Systems and Knowledge Discovery, pages 720–724, 2014.
    [22] Yanpei Chen, Sara Alspaugh, Dhruba Borthakur, and Randy Katz. Energy efficiency for large-scale mapreduce workloads with significant interactive analysis. In Proceedings of the ACM european conference on Computer Systems, pages 43–56, 2012.
    [23] Dhruba Borthakur, Jonathan Gray, Joydeep Sen Sarma, KannanMuthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon, Samuel Rash, Rodrigo Schmidt, and Amitanand Aiyer. Apache hadoop goes realtime at facebook. In Proceedings of the ACM SIGMOD International Conference on Management of data, pages 1071–1080, 2011.
    [24] Ashish Thusoo, Dhruba Borthakur, and Raghotham Murthy. Data warehousing and analytics infrastructure at facebook. In Proceedings of the ACM SIGMOD International Conference on Management of data, pages 1013–1020, 2010.
    [25] Dingyu Yang, Jian Cao, Sai Wu, and Jie Wang. Progressive online aggregation in a distributed stream system. Journal of System and Software, 102:146–157, 2015.
    [26] Linh T.X., Phan Zhuoyao, Zhang Boon, Thau Loo, and Insup Lee. Real-time mapreduce scheduling. Technical report, Department of Computer and Information Science, University of Pennsylvania, 2010.
    [27] Adam J, Vana KalogerakiDimitrios, TaneliMielikainen, and Ville Tuulos. Scheduling for real-time mobile mapreduce systems. In Proceedings of the ACM international conference on Distributed event-based system, pages 347–358, 2011.
    [28] Linh T. X. Phan, Zhuoyao Zhang, Qi Zheng, Boon Thau Loo, and Insup Lee. An empirical analysis of scheduling techniques for real-time cloud-based data processing. In Proceedings of the IEEE International Conference on Service-Oriented Computing and Application, pages 1–8, 2011.
    [29] Xicheng Dong, Ying Wang, and Huaming Liao. Scheduling mixed real-time and non-real-time applications in mapreduce environment. In Proceedings of the Parallel and Distributed Systems, pages 9–16, 2011.
    [30] Kamal Kc and Kemafor Anyanwu. Scheduling hadoop jobs to meet deadlines. In Proceedings of the IEEE Cloud Computing Technology and Science, pages 388–392, 2010.
    [31] J. Polo, Barcelona, D. Carrera, Y. Becerra, and J. Torres. Performance-driven task co-scheduling for mapreduce environments. In Proceedings of the Network Operations andManagement Symposium, pages 373–380, 2010.
    [32] JoelWolf, Deepak Rajan, Kirsten Hildrum, Rohit Khandekar, Vibhore Kumar, Sujay Parekh, Kun-Lung Wu, and Andrey balmin. Flex: A slot allocation scheduling optimizer for mapreduce workloads. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference, pages 1–20, 2010.
    [33] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell. Aria: automatic resource inference and allocation for mapreduce environments. In Proceedings of the ACM international conference on Autonomic computing, pages 235–244, 2011.
    [34] Zhuo Tang, Junqing Zhou, Kenli Li, and Ruixuan Li. A mapreduce task scheduling algorithm for deadline constraints. Cluster Computing, 16:651–662, 2013.
    [35] Tan Deng, , Changsha, and Kenli Li. A mapreduce scheduling algorithm for time constraints in heterogeneous environment. In Proceedings of the Natural Computation, pages 1088–1093, 2014.
    [36] Chien Hung Chen, Jenn Wei Lin, and Sy Yen Kuo. Deadline-constrained mapreduce scheduling based on graph modelling. In Proceedings of the IEEE International Conference on Cloud Computing, pages 416–423, 2014.
    [37] Ying Li, Jinju, Hongli Zhang, and Kyong Hoon Kim. A power-aware scheduling of mapreduce applications in the cloud. Proceedings of the IEEE Dependable, Autonomic and Secure Computing (DASC), pages 613–620, 2011.
    [38] L. Mashayekhy, M.M. Nejad, D. Grosu, and Dajun Lu. Energy-aware scheduling of mapreduce jobs. In Proceedings of the IEEE International Congress on Big Data, pages 32–39, 2014.
    [39] Luiz Andre Barroso and Urs Holzle. The case for energy-proportional computing. Computer, 40:33–37, 2007.
    [40] Sangyeun Cho. On the interplay of parallelization, program performance, and energy consumption.21:342–353, 2010.
    [41] Rini T. Kaushik and Milind Bhandarkar. Greenhdfs:towards an energy-conserving, storageefficient, hybrid hadoop compute cluster. In Proceedings of the International conference on Power aware computing and systems, pages 1–9, 2010.
    [42] Jacob Leverich and Christos Kozyrakis. On the energy (in)efficiency of hadoop clusters. ACM SIGOPS Operating Systems Review, 44:61–65, 2010.
    [43] Dongsong Zhang. Global edf-based online, energy-efficient real-time scheduling in multi-core platform. 2:666–670, 2011.
    [44] Naveen Anne and Venkatesan Muthukumar. Energy aware scheduling of aperiodic real-time tasks on multiprocessor systems. Journal of Computing Science and Engineering, 7:30–43, 2013.
    [45] Inc. T1 Shopper. File transfer time - data transfer speed calculator.
    [46] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell. Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance. In Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pages 11 – 18, 2012.
    [47] Willis Lang and Jignesh M. Patel. Energy management for mapreduce clusters. In Proceedings of the VLDB Endowment, pages 129–139, 2010.
    [48] Spec. Specpower-ssj2008.
    [49] C. Rusu, A. Ferreira, C. Scordino, and A.Watson. Energy-efficient real-time heterogeneous server clusters. In Proceedings of the Real-Time and Embedded Technology and Applications Symposium, pages 418–428, 2006.
    [50] Mohammed Alrokayan, Amir Vahid Dastjerdi, and Rajkumar Buyya. Sla-aware provisioning and scheduling of cloud resources for big data analytics. In Proceedings of the Cloud Computing in Emerging Markets, pages 1–8, 2012.
    [51] Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, Cesar A. F. De Rose, and Rajkumar Buyya.
    Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41:23–50, 2011.
    [52] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, and Zheng Shao. Hive- a warehousing solution over a map-reduce framework. In Proceedings of the VLDB Endwment, pages 1626–1629, 2009.
    [53] Soila Kavulya, Jiaqi Tan, Rajeev Gandhi, and Priya Narasimhan. An analysis of traces from a production mapreduce cluster. In Proceedings of the IEEE/ACM Cluster, Cloud and Grid Computing, pages 94–103, 2010.

    無法下載圖示 全文公開日期 2020/08/27 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE