適用於嵌入式混和運算平台之高效能協同運算架構｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	郭致廷 Chih-Ting Kuo
論文名稱：	適用於嵌入式混和運算平台之高效能協同運算架構 A Highly Efficient Co-Processing Scheme for Embedded Hybrid Computing Platform
指導教授：	沈中安 Chung-An Shen
口試委員:	吳晉賢 Chin-Hsien Wu 陳永耀 Yung-Yao Chen 黃琴雅 Chin-Ya Huang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	35
中文關鍵詞：	嵌入式系統、效能改進、混和運算、演算法重新設計
外文關鍵詞：	embedded systems, performance improvement, co-processing, algorithm redesign
相關次數：	點閱：126 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

嵌入式系統具備著體積小、耗電低的優勢，但也導致硬體資源相當稀缺。隨著科技的日新月異，越來越多演算法被提出，並且這些演算法也會需要更多的硬體資源來進行運算。因此，搭載了多個硬體運算資源的異質性嵌入式系統及平台成為了一大重點。對於異質性嵌入式系統而言，能夠有效運用系統上的硬體資源，將成為一大關鍵。本文針對異質性嵌入式系統提出了一個高效率演算法架構，能夠大幅減少硬體閒置時間，並且確保演算法運算時會完整運用到硬體資源。基於此架構，本文針對兩種不同的演算法進行重新設計，並且能夠讓這兩種演算法只需要原先的42.8%以及64.1%的運算時間即可完成相同的運算內容。

Embedded systems have the advantages of small size and low power consumption, but they also suffer from limited hardware resources. With the rapid advancement of technology, an increasing number of algorithms are being proposed, and these algorithms require more hardware resources for computation. Therefore, heterogeneous embedded systems and platforms with multiple hardware computational resources have become a major focus. For heterogeneous embedded systems, the effective utilization of hardware resources on the system becomes a key factor. In this paper, we propose an efficient algorithm architecture for heterogeneous embedded systems, which significantly reduces hardware idle time and ensures the complete utilization of hardware resources during algorithm execution. Based on proposed architecture, we redesign two different algorithms, which can achieve the same computational tasks in only 42.8% and 64.1% of the original computation time, respectively.

摘要    2
Abstract    3
Table of content    4
List of Figures    5
Chapter 1 Introduction    6
1.1 Motivation and Objectives    6
1.2 Structure of this paper    9
Chapter 2 Related Work and background    10
2.1 Related Work    10
2.2 Background    12
2.2.1 AGX Orin    12
2.2.2 Compute Unified Device Architecture Programming Model    12
2.2.3 CUDA Stream    15
2.2.4 GPU Occupancy    15
2.2.5 Nsight Systems    16
2.2.6 Nsight Compute    17
2.2.7 Unified Memory Architecture    17
Chapter 3. Proposed Architecture    19
3.1 Original algorithm    19
3.2 Analysis of algorithm performance    19
3.3 Proposed Architecture    21
3.3.1 Reduing idle time of hardware    22
3.3.2 Keep hardware full loaded    24
3.3.3 Improvement of GPU occupancy and utilization    25
Chapter 4 Experimental Results    27
Chapter 5 Conclusion    32
References    33


                                

[1] Yüksel, M. E. (2019). The design and implementation of a batteryless wireless embedded system for IoT applications. Electrica, 19(1), 1-11.
[2] Kristensen, F., Hedberg, H., Jiang, H., Nilsson, P., & Öwall, V. (2008). An embedded real-time surveillance system: Implementation and evaluation. Journal of Signal Processing Systems, 52, 75-94.
[3] Ravi, S., Raghunathan, A., Kocher, P., & Hattangady, S. (2004). Security in embedded systems: Design challenges. ACM Transactions on Embedded Computing Systems (TECS), 3(3), 461-491.
[4] Janakiraman, S., Thenmozhi, K., Rayappan, J. B. B., & Amirtharajan, R. (2018). Lightweight chaotic image encryption algorithm for real-time embedded system: Implementation and analysis on 32-bit microcontroller. Microprocessors and Microsystems, 56, 1-12.
[5] He, J., Lu, M., & He, B. (2013). Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. arXiv preprint arXiv:1307.1955.
[6] Tomé, D. G., Gubner, T., Raasveldt, M., Rozenberg, E., & Boncz, P. A. (2018, January). Optimizing Group-By and Aggregation using GPU-CPU Co-Processing. In ADMS@ VLDB (pp. 1-10).
[7] LIU, G. F., LIU, Q., LI, B., TONG, X. L., & LIU, H. (2009). GPU/CPU co-processing parallel computation for seismic data processing in oil and gas exploration. Progress in Geophysics, 24(5), 1671-1678.
[8] Olagoke, A. S., Ibrahim, H., & Teoh, S. S. (2020). Literature survey on multi-camera system and its application. IEEE Access, 8, 172892-172922.
[9] Chang, T. H., & Gong, S. (2001, July). Tracking multiple people with a multi-camera system. In Proceedings 2001 ieee workshop on multi-object tracking (pp. 19-26). IEEE.
[10] Nieto, R. M., & Sánchez, J. M. M. (2013, August). An automatic system for sports analytics in multi-camera tennis videos. In 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (pp. 438-442). IEEE.
[11] Godaliyadda, D., Garud, H., Poddar, D., Nagori, S., & Pande, T. (2021). Efficient Visual Localization on TDA4VM.
[12] Prakash, A., Amrouch, H., Shafique, M., Mitra, T., & Henkel, J. (2016, June). Improving mobile gaming performance through cooperative CPU-GPU thermal management. In Proceedings of the 53rd annual design automation conference (pp. 1-6).
[13] Lee, C., Ro, W. W., & Gaudiot, J. L. (2012, February). Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids. In 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT) (pp. 33-40). IEEE.
[14] Hariharan, J., Varior, R. R., & Karunakaran, S. (2023). Real-time Driver Monitoring Systems on Edge AI Device. arXiv preprint arXiv:2304.01555.
[15] Yao, P., An, H., Xu, M., Liu, G., Li, X., Wang, Y., & Han, W. (2010, June). CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application. In 2010 International Conference on High Performance Computing & Simulation (pp. 24-30). IEEE.
[16] NVIDIA Jetson Orin. (n.d.). https://developer.nvidia.com/embedded/jetson-orin
[17] Lee, C., Ro, W. W., & Gaudiot, J. L. (2012, February). Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids. In 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT) (pp. 33-40). IEEE.
[18] Guide, D. (2013). Cuda c programming guide. NVIDIA, July, 29, 31.
[19] NVIDIA Nsight Systems. (2023, June 27). NVIDIA Developer. https://developer.nvidia.com/nsight-systems
[20] NVIDIA Nsight Compute. (2023, June 27). NVIDIA Developer. https://developer.nvidia.com/nsight-compute
[21] Landaverde, R., Zhang, T., Coskun, A. K., & Herbordt, M. (2014, September). An investigation of unified memory access performance in CUDA. In 2014 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-6). IEEE.
[22] Unified Memory in CUDA 6 | NVIDIA Technical Blog. (2022, August 21). NVIDIA Technical Blog. https://developer.nvidia.com/blog/unified-memory-in-cuda-6/

全文公開日期 2028/07/23 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文