非對稱多核心系統之動態緩存爭奪感知排程與管理｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖健賀 Jian-He Liao
論文名稱：	非對稱多核心系統之動態緩存爭奪感知排程與管理 Adaptive Cache Contention-aware Scheduling and Governor for Asymmetric Multicore System
指導教授：	陳雅淑 Ya-Shu Chen
口試委員:	曾學文 Hsueh-Wen Tseng 謝仁偉 Jen-Wei Hsieh 吳晉賢 Chin-Hsien Wu
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	33
中文關鍵詞：	共用資源爭奪、非對稱多核心、排程、嵌入式系統
外文關鍵詞：	Shared Resource Contention, Asymmetric Multicore, Scheduling,, Embedded System
相關次數：	點閱：224 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

為了權衡系統性能和系統功率，非對稱多核心架構被廣泛的應用於嵌入式系統。隨著系統中同時執行的應用程式數量日漸增加，功率消耗和最後一級緩存延遲也隨之增加。為了在功率限制下最大化系統性能，我們提出了適用於非對稱多核系統的自適應緩存爭奪感知的排程器與調度器。為了處理動態工作負載和動態緩存競爭議題，我們提出了利用單指令運行時間(CPI)模型來學習應用程式之間的性能、核心執行頻率和非對稱多核心之間的關係。基於CPI模型的預測，利用排程器決定應用程式的執行核心，並使用調度器控制大小核心的頻率以在功率限制下最大化系統吞吐量。針對應用程式對資源存取的動態行為，利用CPI模型於運行時間進行自適應學習。我們提出的演算法實作於Odroid XU4開板上，實驗結果顯示該演算法可以在滿足功率限制下提高系統吞吐量。

Asymmetric multicore architecture is widely applied to the embedded systems to better trade-off performance and energy consumption. With an increased number of applications concurrently executed in the system, the power consumption and the associated last-level cache latency are increased. To maximize the system performance under the power constraint, we proposed an adaptive cache contention-aware scheduling and the governor for asymmetric multicore systems. To deal with the dynamic workload and cache contention effect, the CPI model learning is presented to adjusts the relation between system performance, executing frequency, and executing clusters. Based on the CPI model prediction, the run-time governor and the dispatcher are then presented to determine the executing frequency and cores to maximize system throughput under power constraint. The proposed algorithm was implemented on the commercial Odroid XU4 board, and performance was evaluated using real-world benchmarks. Results show that we can increase throughput by up to 71% compared to offline methods while meeting the power constraint.

Introduction
Related Work
System Model
Approaches
1 Overview
2 CPI Model Learning
3 Adaptation Governor
4 Contention-aware Dispatcher
Experimental Evaluation
1 Experimental Setup
2 Experimental Result
Conclusion
References
                                

[1] “big.little technology: The future of mobile.” https://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile.pdf.
[2] S. Pagani, H. Khdr, J.-J. Chen, M. Shafique, M. Li, and J. Henkel, “Thermal safe power (tsp): Efficient power budgeting for heterogeneous manycore systems in dark silicon,” IEEE Transactions on Computers, vol. 66, no. 1, pp. 147–162, 2016.
[3] Q. Zhao, D. Koh, S. Raza, D. Bruening, W.-F. Wong, and S. Amarasinghe, “Dynamic cache contention detection in multi-threaded applications,” in ACM SIGPLAN Notices, vol. 46, pp. 27–38, ACM, 2011.
[4] C. Xu, X. Chen, R. P. Dick, and Z. M. Mao, “Cache contention and application performance prediction for multi-core systems,” in 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 76–86, IEEE, 2010.
[5] H.-K. Kuo, B.-C. C. Lai, and J.-Y. Jou, “Reducing contention in shared last-level cache for throughput processors,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 20, no. 1, p. 12, 2014.
[6] S. K. Shekofteh, H. Deldari, and M. B. Khalkhali, “Reducing cache contention in a multi-core processor via a scheduler,” in 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 6, pp. V6–555, IEEE, 2010.
[7] P.-H. Wang, C.-H. Li, and C.-L. Yang, “Latency sensitivity-based cache partitioning for heterogeneous multi-core architecture,” in Proceedings of the 53rd Annual Design Automation Conference, p. 5, ACM, 2016.
[8] J. Lee and H. Kim, “Tap: A tlp-aware cache management policy for a cpugpu heterogeneous architecture,” in IEEE International Symposium on High-Performance Comp Architecture, pp. 1–12, IEEE, 2012.
[9] J. V. Quiroga Esparza, “Heterogeneous cpu/gpu memory hierarchy analysis and optimization,” Master’s thesis, Universitat Polit`ecnica de Catalunya, 2015.
[10] S. Rai and M. Chaudhuri, “Using criticality of gpu accesses in memory management for cpu-gpu heterogeneous multi-core processors,” ACM Transactions on Embedded Computing Systems (TECS), vol. 16, no. 5s, p. 133, 2017.
[11] H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha, “Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms,” in 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 55–64, IEEE, 2013.
[12] D. Xu, C. Wu, and P.-C. Yew, “On mitigating memory bandwidth contention through bandwidth-aware scheduling,” in Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pp. 237–248, ACM, 2010.
[13] A. Garcia-Garcia, J. C. Saez, F. Castro, and M. Prieto-Matias, “Lfoc: A lightweight fairness-oriented cache clustering policy for commodity multicores,” in Proceedings of the 48th International Conference on Parallel Processing, p. 14, ACM, 2019.
[14] S. Stepanovic, G. Georgakarakos, S. Holmbacka, and J. Lilius, “Quantifying the interaction between structural properties of software and hardware in the arm big. little architecture,” in 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 138–144, IEEE, 2018.
[15] M. Rapp, A. Pathania, and J. Henkel, “Pareto-optimal power-and cache-aware task mapping for many-cores with distributed shared last-level cache,” in Proceedings of the International Symposium on Low Power Electronics and Design, p. 16, ACM, 2018.
[16] X. Fan, Y. Sui, and J. Xue, “Contention-aware scheduling for asymmetric multicore processors,” in 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 742–751, IEEE, 2015.
[17] K.-H. Chen, J.-J. Chen, F. Kriebel, S. Rehman,M. Shafique, and J. Henkel, “Task mapping for redundant multithreading in multi-cores with reliability and performance heterogeneity,” IEEE Transactions on Computers, vol. 65, no. 11, pp. 3441–3455, 2016.
[18] D. Shelepov, J. C. Saez Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar, “Hass: a scheduler for heterogeneous multicore systems,” ACM SIGOPS Operating Systems Review, vol. 43, no. 2, pp. 66–75, 2009.
[19] D. Koufaty, D. Reddy, and S. Hahn, “Bias scheduling in heterogeneous multi-core architectures,” in Proceedings of the 5th European conference on Computer systems, pp. 125–138, ACM, 2010.
[20] K. Chronaki, A. Rico, R. M. Badia, E. Ayguad´e, J. Labarta, and M. Valero, “Criticality-aware dynamic task scheduling for heterogeneous architectures,” in Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 329–338, ACM, 2015.
[21] M. Kim, S. Noh, S. Huh, and S. Hong, “Fair-share scheduling for performance-asymmetric multicore architecture via scaled virtual runtime,” in 2015 IEEE 21st International Conference on Embedded and Real-Time Computing Systems and Applications, pp. 60–69, IEEE, 2015.
[22] R. Nishtala, D. Moss´e, and V. Petrucci, “Energy-aware thread co-location in heterogeneous multicore processors,” in Proceedings of the Eleventh ACM International Conference on Embedded Software, p. 21, IEEE Press, 2013.
[23] J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt, “Utility-based acceleration of multithreaded applications on asymmetric cmps,” ACM SIGARCH Computer Architecture News, vol. 41, no. 3, pp. 154–165, 2013.
[24] K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer, “Scheduling heterogeneous multi-cores through performance impact estimation (pie),” in ACM SIGARCH Computer Architecture News, vol. 40, pp. 213–224, IEEE Computer Society, 2012.
[25] R. K. Pal, I. Shanaya, K. Paul, and S. Prasad, “Dynamic core allocation for energy efficient video decoding in homogeneous and heterogeneous multicore architectures,” Future Generation Computer Systems, vol. 56, pp. 247–261, 2016.
[26] A. Pathania, S. Pagani, M. Shafique, and J. Henkel, “Power management for mobile games on asymmetric multi-cores,” in 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pp. 243–248, IEEE, 2015.
[27] J. C. Saez, A. Pousa, F. Castro, D. Chaver, and M. Prieto-Matias, “Towards completely fair scheduling on asymmetric single-isa multicore processors,” Journal of Parallel and Distributed Computing, vol. 102, pp. 115–131, 2017.
[28] K. Van Craeynest, S. Akram, W. Heirman, A. Jaleel, and L. Eeckhout, “Fairness-aware scheduling on single-isa heterogeneous multi-cores,” in Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, pp. 177–187, IEEE, 2013.
[29] D. Xu, C. Wu, P.-C. Yew, J. Li, and Z. Wang, “Providing fairness on shared-memory multiprocessors via process scheduling,” in ACM SIGMETRICS Performance Evaluation Review, vol. 40, pp. 295–306, ACM, 2012.
[30] C. Tan, T. S. Muthukaruppan, T. Mitra, and L. Ju, “Approximation-aware scheduling on heterogeneous multi-core architectures,” in The 20th Asia and South Pacific Design Automation Conference, pp. 618–623, IEEE, 2015.
[31] H. Khdr, S. Pagani, E. Sousa, V. Lari, A. Pathania, F. Hannig, M. Shafique, J. Teich, and J. Henkel, “Power density-aware resource management for heterogeneous tiled multicores,” IEEE Transactions on Computers, vol. 66, no. 3, pp. 488–501, 2016.
[32] “The stress-ng benchmark.” https://wiki.ubuntu.com/Kernel/Reference/stress-ng.
[33] “Odroid xu4 board.” https://wiki.odroid.com/odroid-xu4/odroid-xu4.
[34] “Perf event open.” http://man7.org/linux/man-pages/man2/perf_event_open.2.html.
[35] “Nasa advanced supercomputing division, nas parallel benchmarks.” https://www.nas.nasa.gov/publications/npb.html.
[36] C. Bienia, Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.

全文公開日期 2024/08/28 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文