簡易檢索 / 詳目顯示

研究生: 廖偉丞
Wei-Cheng Liao
論文名稱: 改善相依性程式之排程法
Dependency Aware GPGPU Kernel Scheduling
指導教授: 黃元欣
Yuan-Shin Hwang
口試委員: 謝仁偉
Jen-Wei Hsieh
賴祐吉
YU-CHI Lai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 106
語文別: 中文
論文頁數: 41
中文關鍵詞: 圖形處理器通用計算相依性核心排程法執行緒塊排程法快取寫回規則
外文關鍵詞: GPGPU, Dependent Kernels Scheduling, CTA Scheduling, Cache Write Policy
相關次數: 點閱:440下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  現今GPU已廣泛應用於各個領域,例如影像處理、深度學習、人工智慧等等,而相關的研究也不斷地被提出,顯得此領域日趨重要。而諸如深度學習、人工智慧類型的程式通常需要大量運算,這些運算不會只交給一個程式核心來執行,而是將任務分給不同的子核心,這些具有資料相依性的子核心我們將其歸類為相依性核心。

  以目前GPU程式寫法來說,這些相依性核心會因為資料相依而必須採用序列方式執行,這將造成GPU平行度降低。另外GPU通常會將處理完的資料存回記憶體,但對於相依性核心來說這些資料都是會再使用的,因此造成不必要的記憶體存取。

  本篇論文提出的方法透過修改模擬器中的排程法打破了相依性核心必須序列執行的規則,並且搭配適合的記憶體寫回操作將資料留在快取記憶體中,讓資料能夠重複使用,以此達到提高效能的目標。


  Recent GPUs are widely used in a variety of areas, such as image processing, deep learning, artificial intelligence, and so on. Related researches are constantly being presented. And programs such as deep learning and artificial intelligence do not use only one kernel to execute, but assign tasks to different sub-kernels, which are data dependency. We categorize them as dependent kernels.

  In the case of current GPU programming, these dependent kernels will be implemented in a sequential manner because of data dependencies, which will result in a reduction in GPU parallelism. In addition, the GPU will usually save the processed data back to memory, but for the dependent core, the data will be used again, resulting in unnecessary memory access.

  We proposed the method to break the rules that the dependency kernel must perform in sequence by modifying the scheduling method in the simulator and retains the data in the cache memory with the appropriate memory write back policy.

論文摘要 致謝 目錄 圖目錄 表目錄 第一章 序論 1.1 研究背景 1.2 研究動機 1.3 研究目的 1.4 研究方法 1.5 論文架構 第二章 文獻回顧 2.1 GPGPU-Sim 2.2 Kernel Scheduling 2.3 CTA Scheduling 2.4 CUDA記憶體 2.5 相關文獻 2.6 本章結論 第三章 方法 3.1 概念 3.2 Kernel Scheduling 3.2.1 修改Stream 3.2.2 修改SM 3.3 CTA Scheduling 3.4 Cache Policy 3.5 本章結論 第四章 實驗結果 4.1 實驗環境 4.2 測試程式說明 4.3 實驗數據 第五章 結論與未來展望 5.1 結論 5.2 未來展望 參考文獻

[1] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of ISPASS’09, pages 163–174.

[2] Onur Kayıran, Adwait Jog, Mahmut T. Kandemir and Chita R. Das, Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs. In CSE Penn State Tech Report, TR-CSE-2012-006, 2012.

[3] Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. Enabling preemptive multiprogramming on GPUs. In Proc. of the 41st Annual International Symposium on Computer Architecture, pages 193– 204, 2014

[4] Gwangsun Kim, Jiyun Jeong, John Kim, Mark Stephenson, Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs. In PACT’16, September 11-15, 2016, Haifa, Israel

[5]GeForce GTX 480
http://www.nvidia.com.tw/object/product_geforce_gtx_480_tw.html

[6] Hotball's Hive, “CDUA Matrix multiplication”, http://www2.kimicat.com/%E7%AC%AC%E4%BA%8C%E5%80%8Bcuda%E7%A8%8B%E5%BC%8F

[7] NVIDIA CUDA: Kepler Vs. Fermi Architecture
http://blog.cuvilib.com/2012/03/28/nvidia-cuda-kepler-vs-fermi-architecture/

[8] NVIDIA’s Next Generation CUDA Compute Architecture: Fermi
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

[9] Songho.ca ,"Convolution" http://www.songho.ca/dsp/convolution/convolution.html

[10] Caffe
http://caffe.berkeleyvision.org/

[11]TensorFlow
https://www.tensorflow.org/get_started/

[12] NVIDIA, “CUDA Toolkit Documentation v4.2,
https://developer.nvidia.com/cuda-toolkit-42-archive

[13]GPGPU-Sim Main Page
http://gpgpu-sim.org/manual/index.php/Main_Page

[14] GPGPU-SIM Code Study
http://people.cs.pitt.edu/~yongli/notes/gpgpu/GPGPUSIMNotes.html

[15] How the Fermi Thread Block Scheduler Works (Illustrated)
https://users.ices.utexas.edu/~sreepai/fermi-tbs/

無法下載圖示 全文公開日期 2022/10/16 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE