簡易檢索 / 詳目顯示

研究生: 盧振揚
Zheng-Yang Lu
論文名稱: A Data-Locality-Aware MapReduce Scheduling Framework on In-Storage Processing Architecture
A Data-Locality-Aware MapReduce Scheduling Framework on In-Storage Processing Architecture
指導教授: 陳雅淑
Ya-Shu Chen
口試委員: 謝仁偉
Jen-Wei Hsieh
曾學文
Hsueh-Wen Tseng
吳晉賢
Chin-Hsien Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 42
中文關鍵詞: In-storage processingHadoop MapReduceTask SchedulingData-localitySSD
外文關鍵詞: In-storage processing, Hadoop MapReduce, Task Scheduling, Data-locality, SSD
相關次數: 點閱:298下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

MapReduce is widely used for large-scale data processing, however, the data movement overhead among nodes and the data movement overhead between the host CPU and the storage are getting serious with the increased data size of applications. To improve performance of MapReduce applications by minimizing the data movement, we first propose to execute MapReduce applications on the solid-state-drives (SSD) with in-storage processing and exchange data through NVMe-over-TCP/IP. The scheduling framework with Waiting Time Estimation Dispatcher and Workload-Aware Scheduler is then presented to manage mutltiple MapReduce applications with considering task interference, task dependence, and data-locality. Experiment results show the proposed method achieves a remarkable performance comparing with the default system.


MapReduce is widely used for large-scale data processing, however, the data movement overhead among nodes and the data movement overhead between the host CPU and the storage are getting serious with the increased data size of applications. To improve performance of MapReduce applications by minimizing the data movement, we first propose to execute MapReduce applications on the solid-state-drives (SSD) with in-storage processing and exchange data through NVMe-over-TCP/IP. The scheduling framework with Waiting Time Estimation Dispatcher and Workload-Aware Scheduler is then presented to manage mutltiple MapReduce applications with considering task interference, task dependence, and data-locality. Experiment results show the proposed method achieves a remarkable performance comparing with the default system.

1. Introduction 2. Related Work 3. Background and System Model 4. Proposed Method 5. Experimnet 6. Conclusion

[1] M. Torabzadehkashi, S. Rezaei, V. Alves, and N. Bagherzadeh, “Compstor: an in-storage computation platform for scalable distributed processing,” in 2018 IEEE International Parallel and Distributed Processing
Symposium Workshops (IPDPSW), pp. 1260–1267, IEEE, 2018.
[2] M. Torabzadehkashi, S. Rezaei, A. HeydariGorji, H. Bobarshad,
V. Alves, and N. Bagherzadeh, “Computational storage: an efficient and
scalable platform for big data and hpc applications,” Journal of Big
Data, vol. 6, no. 1, p. 100, 2019.
[3] D. Park, J. Wang, and Y.-S. Kee, “In-storage computing for hadoop
mapreduce framework: Challenges and possibilities,” IEEE Transactions
on Computers, 2016.
[4] A. Acharya, M. Uysal, and J. Saltz, “Active disks: Programming model,
algorithms and evaluation,” ACM SIGOPS Operating Systems Review,
vol. 32, no. 5, pp. 81–91, 1998.
[5] K. Keeton, D. A. Patterson, and J. M. Hellerstein, “A case for intelligent
disks (idisks),” Acm Sigmod Record, vol. 27, no. 3, pp. 42–52, 1998.
[6] Z. Ruan, T. He, and J. Cong, “{INSIDER}: Designing in-storage computing system for emerging high-performance drive,” in 2019 {USENIX}
Annual Technical Conference ({USENIX}{ATC} 19), pp. 379–394,
2019.
[7] X. Song, T. Xie, and W. Pan, “Risp: a reconfigurable in-storage
processing framework with energy-awareness,” in 2018 18th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 193–202, IEEE, 2018.
[8] I. S. Choi and Y.-S. Kee, “Energy efficient scale-in clusters with instorage processing for big-data analytics,” in Proceedings of the 2015
International Symposium on Memory Systems, pp. 265–273, 2015.
[9] V. S. Mailthody, Z. Qureshi, W. Liang, Z. Feng, S. G. De Gonzalo, Y. Li,
H. Franke, J. Xiong, J. Huang, and W.-m. Hwu, “Deepstore: in-storage
acceleration for intelligent queries,” in Proceedings of the 52nd Annual
IEEE/ACM International Symposium on Microarchitecture, pp. 224–238,
2019.
[10] S.-W. Jun, H. T. Nguyen, V. Gadepally, et al., “In-storage embedded accelerator for sparse pattern processing,” in 2016 IEEE High Performance
Extreme Computing Conference (HPEC), pp. 1–7, IEEE, 2016.
[11] M. Torabzadehkashi, S. Rezaei, A. Heydarigorji, H. Bobarshad,
V. Alves, and N. Bagherzadeh, “Catalina: In-storage processing acceleration for scalable big data analytics,” in 2019 27th Euromicro
International Conference on Parallel, Distributed and Network-Based
Processing (PDP), pp. 430–437, IEEE, 2019.
[12] Y. Liu, C. Q. Wu, M. Wang, A. Hou, and Y. Wang, “On a dynamic
data placement strategy for heterogeneous hadoop clusters,” in 2018
International Symposium on Networks, Computers and Communications
(ISNCC), pp. 1–7, IEEE, 2018.
[13] S. Sridevi, J. Reshma, E. Pavithradevi, S. Dhivya, and V. R. Uthariaraj,
“Enhanced bond energy algorithm for data placement in hadoop framework,” in 2018 Tenth International Conference on Advanced Computing
(ICoAC), pp. 208–215, IEEE, 2018.
[14] Z. Tang, M. Liu, A. Ammar, K. Li, and K. Li, “An optimized mapreduce
workflow scheduling algorithm for heterogeneous computing,” The
Journal of Supercomputing, vol. 72, no. 6, pp. 2059–2079, 2016.
[15] C.-T. Chen, L.-J. Hung, S.-Y. Hsieh, R. Buyya, and A. Y. Zomaya,
“Heterogeneous job allocation scheduler for hadoop mapreduce using
dynamic grouping integrated neighboring search,” IEEE Transactions
on Cloud Computing, 2017.
[16] Y. Fan, W. Liu, D. Guo, W. Wu, and D. Du, “Shuffle scheduling
for mapreduce jobs based on periodic network status,” IEEE/ACM
Transactions on Networking, 2020.
[17] N. Fatma, H. K. Singh, S. Ahmad, and P. Srivastava, “Locality premised
reducer scheduling in hadoop,” in 2016 International Conference System
Modeling & Advancement in Research Trends (SMART), pp. 222–224,
IEEE, 2016.
[18] M. Torabzadehkashi, A. Heydarigorji, S. Rezaei, H. Bobarshad,
V. Alves, and N. Bagherzadeh, “Accelerating hpc applications using
computational storage devices,” in 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th
International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1878–
1885, IEEE, 2019.
[19] “Hadoop task replica policy.” https://hadoop.apache.org/docs/stable/.
[20] “Nvme over fabric.” https://nvmexpress.org/developers/.
[21] “Rdma over converged ethernet.” https://en.wikipedia.org/wiki/RDMA
over Converged Ethernet.
[22] “Fibre channel.” https://en.wikipedia.org/wiki/Fibre Channel.
[23] M. Bjørling, J. Axboe, D. Nellans, and P. Bonnet, “Linux block io: introducing multi-queue ssd access on multi-core systems,” in Proceedings
of the 6th international systems and storage conference, pp. 1–10, 2013.
[24] L. N. Bairavasundaram, G. Soundararajan, V. Mathur, K. Voruganti,
and S. Kleiman, “Italian for beginners: The next steps for slo-based
management.,” in HotStorage, 2011.
[25] “Gigabit ethernet.” https://en.wikipedia.org/wiki/Gigabit Ethernet.
[26] W. Tan, L. Fong, and Y. Liu, “Effectiveness assessment of solid-state
drive used in big data services,” in 2014 IEEE International Conference
on Web Services, pp. 393–400, IEEE, 2014.
[27] M. Li, L. Zeng, S. Meng, J. Tan, L. Zhang, A. R. Butt, and N. Fuller,
“Mronline: Mapreduce online performance tuning,” in Proceedings of
the 23rd international symposium on High-performance parallel and
distributed computing, pp. 165–176, 2014.
[28] V. Pandey and P. Saini, “A heuristic method towards deadline-aware
energy-efficient mapreduce scheduling problem in hadoop yarn,” Cluster
Computing, pp. 1–17, 2020.
[29] Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo, and C. Ren, “Exploring
and exploiting the multilevel parallelism inside ssds for improved
performance and endurance,” IEEE Transactions on Computers, vol. 62,
no. 6, pp. 1141–1155, 2012

無法下載圖示 全文公開日期 2025/08/28 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE