簡易檢索 / 詳目顯示

研究生: 盧振揚
Zheng-Yang Lu
論文名稱: A Data-Locality-Aware MapReduce Scheduling Framework on In-Storage Processing Architecture
A Data-Locality-Aware MapReduce Scheduling Framework on In-Storage Processing Architecture
指導教授: 陳雅淑
Ya-Shu Chen
口試委員: 謝仁偉
Jen-Wei Hsieh
曾學文
Hsueh-Wen Tseng
吳晉賢
Chin-Hsien Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 42
中文關鍵詞: In-storage processingHadoop MapReduceTask SchedulingData-localitySSD
外文關鍵詞: In-storage processing, Hadoop MapReduce, Task Scheduling, Data-locality, SSD
相關次數: 點閱:289下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • MapReduce is widely used for large-scale data processing, however, the data movement overhead among nodes and the data movement overhead between the host CPU and the storage are getting serious with the increased data size of applications. To improve performance of MapReduce applications by minimizing the data movement, we first propose to execute MapReduce applications on the solid-state-drives (SSD) with in-storage processing and exchange data through NVMe-over-TCP/IP. The scheduling framework with Waiting Time Estimation Dispatcher and Workload-Aware Scheduler is then presented to manage mutltiple MapReduce applications with considering task interference, task dependence, and data-locality. Experiment results show the proposed method achieves a remarkable performance comparing with the default system.


    MapReduce is widely used for large-scale data processing, however, the data movement overhead among nodes and the data movement overhead between the host CPU and the storage are getting serious with the increased data size of applications. To improve performance of MapReduce applications by minimizing the data movement, we first propose to execute MapReduce applications on the solid-state-drives (SSD) with in-storage processing and exchange data through NVMe-over-TCP/IP. The scheduling framework with Waiting Time Estimation Dispatcher and Workload-Aware Scheduler is then presented to manage mutltiple MapReduce applications with considering task interference, task dependence, and data-locality. Experiment results show the proposed method achieves a remarkable performance comparing with the default system.

    1. Introduction 2. Related Work 3. Background and System Model 4. Proposed Method 5. Experimnet 6. Conclusion

    [1] M. Torabzadehkashi, S. Rezaei, V. Alves, and N. Bagherzadeh, “Compstor: an in-storage computation platform for scalable distributed processing,” in 2018 IEEE International Parallel and Distributed Processing
    Symposium Workshops (IPDPSW), pp. 1260–1267, IEEE, 2018.
    [2] M. Torabzadehkashi, S. Rezaei, A. HeydariGorji, H. Bobarshad,
    V. Alves, and N. Bagherzadeh, “Computational storage: an efficient and
    scalable platform for big data and hpc applications,” Journal of Big
    Data, vol. 6, no. 1, p. 100, 2019.
    [3] D. Park, J. Wang, and Y.-S. Kee, “In-storage computing for hadoop
    mapreduce framework: Challenges and possibilities,” IEEE Transactions
    on Computers, 2016.
    [4] A. Acharya, M. Uysal, and J. Saltz, “Active disks: Programming model,
    algorithms and evaluation,” ACM SIGOPS Operating Systems Review,
    vol. 32, no. 5, pp. 81–91, 1998.
    [5] K. Keeton, D. A. Patterson, and J. M. Hellerstein, “A case for intelligent
    disks (idisks),” Acm Sigmod Record, vol. 27, no. 3, pp. 42–52, 1998.
    [6] Z. Ruan, T. He, and J. Cong, “{INSIDER}: Designing in-storage computing system for emerging high-performance drive,” in 2019 {USENIX}
    Annual Technical Conference ({USENIX}{ATC} 19), pp. 379–394,
    2019.
    [7] X. Song, T. Xie, and W. Pan, “Risp: a reconfigurable in-storage
    processing framework with energy-awareness,” in 2018 18th IEEE/ACM
    International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 193–202, IEEE, 2018.
    [8] I. S. Choi and Y.-S. Kee, “Energy efficient scale-in clusters with instorage processing for big-data analytics,” in Proceedings of the 2015
    International Symposium on Memory Systems, pp. 265–273, 2015.
    [9] V. S. Mailthody, Z. Qureshi, W. Liang, Z. Feng, S. G. De Gonzalo, Y. Li,
    H. Franke, J. Xiong, J. Huang, and W.-m. Hwu, “Deepstore: in-storage
    acceleration for intelligent queries,” in Proceedings of the 52nd Annual
    IEEE/ACM International Symposium on Microarchitecture, pp. 224–238,
    2019.
    [10] S.-W. Jun, H. T. Nguyen, V. Gadepally, et al., “In-storage embedded accelerator for sparse pattern processing,” in 2016 IEEE High Performance
    Extreme Computing Conference (HPEC), pp. 1–7, IEEE, 2016.
    [11] M. Torabzadehkashi, S. Rezaei, A. Heydarigorji, H. Bobarshad,
    V. Alves, and N. Bagherzadeh, “Catalina: In-storage processing acceleration for scalable big data analytics,” in 2019 27th Euromicro
    International Conference on Parallel, Distributed and Network-Based
    Processing (PDP), pp. 430–437, IEEE, 2019.
    [12] Y. Liu, C. Q. Wu, M. Wang, A. Hou, and Y. Wang, “On a dynamic
    data placement strategy for heterogeneous hadoop clusters,” in 2018
    International Symposium on Networks, Computers and Communications
    (ISNCC), pp. 1–7, IEEE, 2018.
    [13] S. Sridevi, J. Reshma, E. Pavithradevi, S. Dhivya, and V. R. Uthariaraj,
    “Enhanced bond energy algorithm for data placement in hadoop framework,” in 2018 Tenth International Conference on Advanced Computing
    (ICoAC), pp. 208–215, IEEE, 2018.
    [14] Z. Tang, M. Liu, A. Ammar, K. Li, and K. Li, “An optimized mapreduce
    workflow scheduling algorithm for heterogeneous computing,” The
    Journal of Supercomputing, vol. 72, no. 6, pp. 2059–2079, 2016.
    [15] C.-T. Chen, L.-J. Hung, S.-Y. Hsieh, R. Buyya, and A. Y. Zomaya,
    “Heterogeneous job allocation scheduler for hadoop mapreduce using
    dynamic grouping integrated neighboring search,” IEEE Transactions
    on Cloud Computing, 2017.
    [16] Y. Fan, W. Liu, D. Guo, W. Wu, and D. Du, “Shuffle scheduling
    for mapreduce jobs based on periodic network status,” IEEE/ACM
    Transactions on Networking, 2020.
    [17] N. Fatma, H. K. Singh, S. Ahmad, and P. Srivastava, “Locality premised
    reducer scheduling in hadoop,” in 2016 International Conference System
    Modeling & Advancement in Research Trends (SMART), pp. 222–224,
    IEEE, 2016.
    [18] M. Torabzadehkashi, A. Heydarigorji, S. Rezaei, H. Bobarshad,
    V. Alves, and N. Bagherzadeh, “Accelerating hpc applications using
    computational storage devices,” in 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th
    International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1878–
    1885, IEEE, 2019.
    [19] “Hadoop task replica policy.” https://hadoop.apache.org/docs/stable/.
    [20] “Nvme over fabric.” https://nvmexpress.org/developers/.
    [21] “Rdma over converged ethernet.” https://en.wikipedia.org/wiki/RDMA
    over Converged Ethernet.
    [22] “Fibre channel.” https://en.wikipedia.org/wiki/Fibre Channel.
    [23] M. Bjørling, J. Axboe, D. Nellans, and P. Bonnet, “Linux block io: introducing multi-queue ssd access on multi-core systems,” in Proceedings
    of the 6th international systems and storage conference, pp. 1–10, 2013.
    [24] L. N. Bairavasundaram, G. Soundararajan, V. Mathur, K. Voruganti,
    and S. Kleiman, “Italian for beginners: The next steps for slo-based
    management.,” in HotStorage, 2011.
    [25] “Gigabit ethernet.” https://en.wikipedia.org/wiki/Gigabit Ethernet.
    [26] W. Tan, L. Fong, and Y. Liu, “Effectiveness assessment of solid-state
    drive used in big data services,” in 2014 IEEE International Conference
    on Web Services, pp. 393–400, IEEE, 2014.
    [27] M. Li, L. Zeng, S. Meng, J. Tan, L. Zhang, A. R. Butt, and N. Fuller,
    “Mronline: Mapreduce online performance tuning,” in Proceedings of
    the 23rd international symposium on High-performance parallel and
    distributed computing, pp. 165–176, 2014.
    [28] V. Pandey and P. Saini, “A heuristic method towards deadline-aware
    energy-efficient mapreduce scheduling problem in hadoop yarn,” Cluster
    Computing, pp. 1–17, 2020.
    [29] Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo, and C. Ren, “Exploring
    and exploiting the multilevel parallelism inside ssds for improved
    performance and endurance,” IEEE Transactions on Computers, vol. 62,
    no. 6, pp. 1141–1155, 2012

    無法下載圖示 全文公開日期 2025/08/28 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE