簡易檢索 / 詳目顯示

研究生: 陳建瑋
CHIEN-WEI CHEN
論文名稱: 主記憶體運算架構下的一個成本考量之物件管理方法
A Cost-Aware Object Management Method for In-Memory Computing Frameworks
指導教授: 吳晉賢
Chin-Hsien Wu
口試委員: 謝仁偉
Jen-Wei Hsieh
陳雅淑
Ya-Shu Chen
修丕承
Pi-Cheng Hsiu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 46
中文關鍵詞: 記憶體內運算架構成本考量
外文關鍵詞: In-memory computing frameworks, cost-aware
相關次數: 點閱:200下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 以現存的記憶體內運算架構Apache Spark來說,應用程式運行中所產生出來的物件資料可以被駐留在主記憶體裡,當這些物件資料需要被再次使用時便可快速地取得,藉此加快整體程式的執行速度。以運算架構下的一台工作節點來說,當其主記憶體空間不足以駐留新產生出來的物件資料或重新取得回來的物件資料時,Apache Spark內建是使用LRU逐出機制來釋放出足夠的主記憶體空間。當被逐出的物件資料需要被再次使用時,可以藉由參照其血統圖來重新計算回來或者是直接從外部儲存裝置來讀取,而程式設計師一開始要駐留物件資料時就必須要決定好是使用哪一種方式來重新取得被逐出的物件資料。然而,當藉由LRU逐出機制所逐出的物件資料本身的計算成本很高或者是使用不適當的方式來重新取得時,其重新取得所要耗費的成本可能會很高。有鑑於此,在本篇論文中我們提出【主記憶體運算架構下的一個成本考量之物件管理方法】,當運算架構下的一台工作節點其主記憶體空間不足以駐留新產生出來的物件資料或重新取得回來的物件資料時,我們首先先從已被駐留在主記憶體空間裡的物件資料群裡挑選合適的候選人,然後從候選人當中逐出計算成本總和最低且能釋放出最大總和主記憶體空間的物件資料群,最後使用適當的方式來分別處理它們。根據實驗的結果,當物件資料群的使用情境符合80/20法則以及50/50法則時,我們所提出的物件管理方法可以減少當被逐出的物件資料群需要被再次使用時,重新取得所要耗費的整體成本。


    For in-memory computing frameworks such as Apache Spark, objects (i.e., the intermediated data) can be accommodated in the main memory for speeding up the execution process. In terms of a worker node in the in-memory computing frameworks, when its main memory space is not enough to accommodate the new computed or the retrieved object, Apache Spark uses the Least Recently Used (LRU) eviction policy to release enough main memory space. When the evicted object is required in the future, it can be retrieved by re-computing or reading from the external storage devices. However, the retrieving cost of the evicted object could be large due to the intuitive LRU eviction policy and the bad effect of using the straightforward policy to deal with the evicted object. In this thesis, we propose a cost-aware object management method for in-memory computing frameworks. When the main memory space of a worker node is not enough to accommodate the new computed or the retrieved object, we first pick appreciate objects which are already accommodated in the main memory as candidates for eviction and then evict objects with the minimal sum of the creation cost and the maximum sum of the occupied main memory space. According to the experimental results, we can achieve the goal under different access scenarios (i.e., 80/20 and 50/50 principles).

    中文摘要 Abstract 誌謝 目錄 圖目錄 表目錄 1 Introduction 2 Background Knowledge 2.1 In-memory computing 2.2 Apache Spark 2.2.1 Resilient distributed datasets 2.2.2 Immutable feature of RDD 2.2.3 Manipulations executing on RDD 3 Related work 3.1 Selection and replacement algorithms for memory performance improvement in Spark 3.2 Caching optimizing method of internal storage calculation 4 Motivation 5 A Cost-Aware Object Management Method for In-Memory Computing Frameworks 5.1 System Overview 5.2 Pick Objects as Candidates 5.3 Evict Objects from Candidates 5.4 Discard Evicted Objects or Store Them 5.4.1 Move Objects to HDDs 5.4.2 Move Objects to SSDs 5.4.3 Discard Objects directly 5.4.4 Discard Objects from SSDs 6 Experimental Evaluation 6.1 Experimental Setup 6.1.1 System and hardware configurations of used worker node 6.1.2 Average transfer time of the HDD/SSD 6.1.3 Define the attributes of object 6.1.4 About the object used in experiments 6.1.5 About the access scenario used in experiments 6.1.6 About the object management methods used in experiments 6.2 Experimental Results 6.2.1 Experiment1(80/20 principle) 6.2.2 Experiment2(50/50 principle) 7 Conclusion 參考文獻

    [1] A. S. Homepage, “https://spark.apache.org,”
    [2] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin,
    S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction
    for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference
    on Networked Systems Design and Implementation, NSDI’12, (Berkeley, CA, USA),
    pp. 2–2, USENIX Association, 2012.
    [3] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster
    computing with working sets,” in Proceedings of the 2Nd USENIX Conference on
    Hot Topics in Cloud Computing, HotCloud’10, (Berkeley, CA, USA), pp. 10–10,
    USENIX Association, 2010.
    [4] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger, “Evaluation techniques for
    storage hierarchies,” IBM Syst. J., vol. 9, pp. 78–117, June 1970.
    [5] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,”
    Commun. ACM, vol. 51, pp. 107–113, Jan. 2008.
    [6] J. Dean and S. Ghemawat, “Mapreduce: A flexible data processing tool,” Commun.
    ACM, vol. 53, pp. 72–77, Jan. 2010.
    [7] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file
    system,” in Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems
    and Technologies (MSST), MSST ’10, (Washington, DC, USA), pp. 1–10, IEEE
    Computer Society, 2010.
    [8] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” SIGOPS Oper.
    Syst. Rev., vol. 37, pp. 29–43, Oct. 2003.
    [9] H. M. Homepage, “https://hadoop.apache.org,”
    [10] T. White, Hadoop: The Definitive Guide. O’Reilly Media, Inc., 1st ed., 2009.
    [11] M. Duan, K. Li, Z. Tang, G. Xiao, and K. Li, “Selection and replacement algorithms
    for memory performance improvement in spark,” Concurrency and Computation:
    Practice and Experience, pp. n/a–n/a, 2015. cpe.3584.
    [12] “Caching optimizing method of internal storage calculation,” Mar. 12 2014. CN
    Patent App. CN 201,310,531,246.
    [13] M. Baker and T. Sterling, “Cluster computing white paper,” 2000.
    [14] P. Cao and S. Irani, “Cost-aware www proxy caching algorithms,” in Proceedings
    of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium
    on Internet Technologies and Systems, USITS’97, (Berkeley, CA, USA),
    pp. 18–18, USENIX Association, 1997.
    [15] R. Ozcan, I. S. Altingovde, and O. Ulusoy, “Cost-aware strategies for query result
    caching in web search engines,” ACM Trans. Web, vol. 5, pp. 9:1–9:25, May 2011.
    [16] L. Shi, Z. Wang, Y. Yao, and L. Wei, “Streaming media caching model based on
    knapsack problem,” Journal of Networks, vol. 6, no. 9, 2011.
    [17] A. Xiang, L. Xu, and B. Niu, “Knapsack-model-based schemes and wlb algorithm
    for the capacity and efficiency management of web cache,” International Journal of
    Computers Communications & Control, vol. 8, no. 4, pp. 635–646, 2013.
    [18] A. Gharaibeh, A. Khreishah, B. Ji, and M. Ayyash, “A provably efficient online
    collaborative caching algorithm for multicell-coordinated systems,” CoRR, vol. abs/
    1509.02911, 2015.
    [19] S. Martello and P. Toth, Knapsack problems : algorithms and computer implementations.
    Wiley-Interscience series in discrete mathematics and optimization, Chichester,
    New York: Toronto (Ont.), 1990. System requirements for computer disk:
    IBM PC; FORTRAN.
    [20] V. V. Vazirani, Approximation Algorithms. New York, NY, USA: Springer-Verlag
    New York, Inc., 2001.
    [21] A. Silberschatz, P. B. Galvin, and G. Gagne, Operating System Concepts. Wiley
    Publishing, 8th ed., 2008.
    [22] E. J. O’Neil, P. E. O’Neil, and G. Weikum, “The lru-k page replacement algorithm
    for database disk buffering,” SIGMOD Rec., vol. 22, pp. 297–306, June 1993.
    [23] B. Mao, H. Jiang, S. Wu, L. Tian, D. Feng, J. Chen, and L. Zeng, “Hpda: A hybrid
    parity-based disk array for enhanced performance and reliability,” Trans. Storage,
    vol. 8, pp. 4:1–4:20, Feb. 2012.
    [24] G. Sun, Y. Joo, Y. Chen, D. Niu, Y. Xie, Y. Chen, and H. Li, “A Hybrid solid-state
    storage architecture for the performance, energy consumption, and lifetime improvement,”
    in HPCA, pp. 1–12, IEEE Computer Society, 2010.
    [25] R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University
    Press, 1 ed., 1957.

    無法下載圖示 全文公開日期 2021/02/15 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE