主記憶體運算架構下的一個成本考量之物件管理方法｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳建瑋 CHIEN-WEI CHEN
論文名稱：	主記憶體運算架構下的一個成本考量之物件管理方法 A Cost-Aware Object Management Method for In-Memory Computing Frameworks
指導教授：	吳晉賢 Chin-Hsien Wu
口試委員:	謝仁偉 Jen-Wei Hsieh 陳雅淑 Ya-Shu Chen 修丕承 Pi-Cheng Hsiu
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	46
中文關鍵詞：	記憶體內運算架構、成本考量
外文關鍵詞：	In-memory computing frameworks, cost-aware
相關次數：	點閱：200 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

以現存的記憶體內運算架構Apache Spark來說，應用程式運行中所產生出來的物件資料可以被駐留在主記憶體裡，當這些物件資料需要被再次使用時便可快速地取得，藉此加快整體程式的執行速度。以運算架構下的一台工作節點來說，當其主記憶體空間不足以駐留新產生出來的物件資料或重新取得回來的物件資料時，Apache Spark內建是使用LRU逐出機制來釋放出足夠的主記憶體空間。當被逐出的物件資料需要被再次使用時，可以藉由參照其血統圖來重新計算回來或者是直接從外部儲存裝置來讀取，而程式設計師一開始要駐留物件資料時就必須要決定好是使用哪一種方式來重新取得被逐出的物件資料。然而，當藉由LRU逐出機制所逐出的物件資料本身的計算成本很高或者是使用不適當的方式來重新取得時，其重新取得所要耗費的成本可能會很高。有鑑於此，在本篇論文中我們提出【主記憶體運算架構下的一個成本考量之物件管理方法】，當運算架構下的一台工作節點其主記憶體空間不足以駐留新產生出來的物件資料或重新取得回來的物件資料時，我們首先先從已被駐留在主記憶體空間裡的物件資料群裡挑選合適的候選人，然後從候選人當中逐出計算成本總和最低且能釋放出最大總和主記憶體空間的物件資料群，最後使用適當的方式來分別處理它們。根據實驗的結果，當物件資料群的使用情境符合80/20法則以及50/50法則時，我們所提出的物件管理方法可以減少當被逐出的物件資料群需要被再次使用時，重新取得所要耗費的整體成本。

For in-memory computing frameworks such as Apache Spark, objects (i.e., the intermediated data) can be accommodated in the main memory for speeding up the execution process. In terms of a worker node in the in-memory computing frameworks, when its main memory space is not enough to accommodate the new computed or the retrieved object, Apache Spark uses the Least Recently Used (LRU) eviction policy to release enough main memory space. When the evicted object is required in the future, it can be retrieved by re-computing or reading from the external storage devices. However, the retrieving cost of the evicted object could be large due to the intuitive LRU eviction policy and the bad effect of using the straightforward policy to deal with the evicted object. In this thesis, we propose a cost-aware object management method for in-memory computing frameworks. When the main memory space of a worker node is not enough to accommodate the new computed or the retrieved object, we first pick appreciate objects which are already accommodated in the main memory as candidates for eviction and then evict objects with the minimal sum of the creation cost and the maximum sum of the occupied main memory space. According to the experimental results, we can achieve the goal under different access scenarios (i.e., 80/20 and 50/50 principles).

中文摘要
Abstract
誌謝
目錄
圖目錄
表目錄
Introduction
Background Knowledge
1 In-memory computing
2 Apache Spark
2.1 Resilient distributed datasets
2.2 Immutable feature of RDD
2.3 Manipulations executing on RDD
Related work
1 Selection and replacement algorithms for memory performance improvement in Spark
2 Caching optimizing method of internal storage calculation
Motivation
A Cost-Aware Object Management Method for In-Memory Computing Frameworks
1 System Overview
2 Pick Objects as Candidates
3 Evict Objects from Candidates
4 Discard Evicted Objects or Store Them
4.1 Move Objects to HDDs
4.2 Move Objects to SSDs
4.3 Discard Objects directly
4.4 Discard Objects from SSDs
Experimental Evaluation
1 Experimental Setup
1.1 System and hardware configurations of used worker node
1.2 Average transfer time of the HDD/SSD
1.3 Define the attributes of object
1.4 About the object used in experiments
1.5 About the access scenario used in experiments
1.6 About the object management methods used in experiments
2 Experimental Results
2.1 Experiment1(80/20 principle)
2.2 Experiment2(50/50 principle)
Conclusion
參考文獻

                                

[1] A. S. Homepage, “https://spark.apache.org,”
[2] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin,
S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction
for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference
on Networked Systems Design and Implementation, NSDI’12, (Berkeley, CA, USA),
pp. 2–2, USENIX Association, 2012.
[3] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster
computing with working sets,” in Proceedings of the 2Nd USENIX Conference on
Hot Topics in Cloud Computing, HotCloud’10, (Berkeley, CA, USA), pp. 10–10,
USENIX Association, 2010.
[4] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger, “Evaluation techniques for
storage hierarchies,” IBM Syst. J., vol. 9, pp. 78–117, June 1970.
[5] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,”
Commun. ACM, vol. 51, pp. 107–113, Jan. 2008.
[6] J. Dean and S. Ghemawat, “Mapreduce: A flexible data processing tool,” Commun.
ACM, vol. 53, pp. 72–77, Jan. 2010.
[7] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file
system,” in Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems
and Technologies (MSST), MSST ’10, (Washington, DC, USA), pp. 1–10, IEEE
Computer Society, 2010.
[8] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” SIGOPS Oper.
Syst. Rev., vol. 37, pp. 29–43, Oct. 2003.
[9] H. M. Homepage, “https://hadoop.apache.org,”
[10] T. White, Hadoop: The Definitive Guide. O’Reilly Media, Inc., 1st ed., 2009.
[11] M. Duan, K. Li, Z. Tang, G. Xiao, and K. Li, “Selection and replacement algorithms
for memory performance improvement in spark,” Concurrency and Computation:
Practice and Experience, pp. n/a–n/a, 2015. cpe.3584.
[12] “Caching optimizing method of internal storage calculation,” Mar. 12 2014. CN
Patent App. CN 201,310,531,246.
[13] M. Baker and T. Sterling, “Cluster computing white paper,” 2000.
[14] P. Cao and S. Irani, “Cost-aware www proxy caching algorithms,” in Proceedings
of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium
on Internet Technologies and Systems, USITS’97, (Berkeley, CA, USA),
pp. 18–18, USENIX Association, 1997.
[15] R. Ozcan, I. S. Altingovde, and O. Ulusoy, “Cost-aware strategies for query result
caching in web search engines,” ACM Trans. Web, vol. 5, pp. 9:1–9:25, May 2011.
[16] L. Shi, Z. Wang, Y. Yao, and L. Wei, “Streaming media caching model based on
knapsack problem,” Journal of Networks, vol. 6, no. 9, 2011.
[17] A. Xiang, L. Xu, and B. Niu, “Knapsack-model-based schemes and wlb algorithm
for the capacity and efficiency management of web cache,” International Journal of
Computers Communications & Control, vol. 8, no. 4, pp. 635–646, 2013.
[18] A. Gharaibeh, A. Khreishah, B. Ji, and M. Ayyash, “A provably efficient online
collaborative caching algorithm for multicell-coordinated systems,” CoRR, vol. abs/
1509.02911, 2015.
[19] S. Martello and P. Toth, Knapsack problems : algorithms and computer implementations.
Wiley-Interscience series in discrete mathematics and optimization, Chichester,
New York: Toronto (Ont.), 1990. System requirements for computer disk:
IBM PC; FORTRAN.
[20] V. V. Vazirani, Approximation Algorithms. New York, NY, USA: Springer-Verlag
New York, Inc., 2001.
[21] A. Silberschatz, P. B. Galvin, and G. Gagne, Operating System Concepts. Wiley
Publishing, 8th ed., 2008.
[22] E. J. O’Neil, P. E. O’Neil, and G. Weikum, “The lru-k page replacement algorithm
for database disk buffering,” SIGMOD Rec., vol. 22, pp. 297–306, June 1993.
[23] B. Mao, H. Jiang, S. Wu, L. Tian, D. Feng, J. Chen, and L. Zeng, “Hpda: A hybrid
parity-based disk array for enhanced performance and reliability,” Trans. Storage,
vol. 8, pp. 4:1–4:20, Feb. 2012.
[24] G. Sun, Y. Joo, Y. Chen, D. Niu, Y. Xie, Y. Chen, and H. Li, “A Hybrid solid-state
storage architecture for the performance, energy consumption, and lifetime improvement,”
in HPCA, pp. 1–12, IEEE Computer Society, 2010.
[25] R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University
Press, 1 ed., 1957.

全文公開日期 2021/02/15 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文