簡易檢索 / 詳目顯示

研究生: 郭致廷
Chih-Ting Kuo
論文名稱: 適用於嵌入式混和運算平台之高效能協同運算架構
A Highly Efficient Co-Processing Scheme for Embedded Hybrid Computing Platform
指導教授: 沈中安
Chung-An Shen
口試委員: 吳晉賢
Chin-Hsien Wu
陳永耀
Yung-Yao Chen
黃琴雅
Chin-Ya Huang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 35
中文關鍵詞: 嵌入式系統效能改進混和運算演算法重新設計
外文關鍵詞: embedded systems, performance improvement, co-processing, algorithm redesign
相關次數: 點閱:126下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 嵌入式系統具備著體積小、耗電低的優勢,但也導致硬體資源相當稀缺。隨著科技的日新月異,越來越多演算法被提出,並且這些演算法也會需要更多的硬體資源來進行運算。因此,搭載了多個硬體運算資源的異質性嵌入式系統及平台成為了一大重點。對於異質性嵌入式系統而言,能夠有效運用系統上的硬體資源,將成為一大關鍵。本文針對異質性嵌入式系統提出了一個高效率演算法架構,能夠大幅減少硬體閒置時間,並且確保演算法運算時會完整運用到硬體資源。基於此架構,本文針對兩種不同的演算法進行重新設計,並且能夠讓這兩種演算法只需要原先的42.8%以及64.1%的運算時間即可完成相同的運算內容。


    Embedded systems have the advantages of small size and low power consumption, but they also suffer from limited hardware resources. With the rapid advancement of technology, an increasing number of algorithms are being proposed, and these algorithms require more hardware resources for computation. Therefore, heterogeneous embedded systems and platforms with multiple hardware computational resources have become a major focus. For heterogeneous embedded systems, the effective utilization of hardware resources on the system becomes a key factor. In this paper, we propose an efficient algorithm architecture for heterogeneous embedded systems, which significantly reduces hardware idle time and ensures the complete utilization of hardware resources during algorithm execution. Based on proposed architecture, we redesign two different algorithms, which can achieve the same computational tasks in only 42.8% and 64.1% of the original computation time, respectively.

    摘要 2 Abstract 3 Table of content 4 List of Figures 5 Chapter 1 Introduction 6 1.1 Motivation and Objectives 6 1.2 Structure of this paper 9 Chapter 2 Related Work and background 10 2.1 Related Work 10 2.2 Background 12 2.2.1 AGX Orin 12 2.2.2 Compute Unified Device Architecture Programming Model 12 2.2.3 CUDA Stream 15 2.2.4 GPU Occupancy 15 2.2.5 Nsight Systems 16 2.2.6 Nsight Compute 17 2.2.7 Unified Memory Architecture 17 Chapter 3. Proposed Architecture 19 3.1 Original algorithm 19 3.2 Analysis of algorithm performance 19 3.3 Proposed Architecture 21 3.3.1 Reduing idle time of hardware 22 3.3.2 Keep hardware full loaded 24 3.3.3 Improvement of GPU occupancy and utilization 25 Chapter 4 Experimental Results 27 Chapter 5 Conclusion 32 References 33

    [1] Yüksel, M. E. (2019). The design and implementation of a batteryless wireless embedded system for IoT applications. Electrica, 19(1), 1-11.
    [2] Kristensen, F., Hedberg, H., Jiang, H., Nilsson, P., & Öwall, V. (2008). An embedded real-time surveillance system: Implementation and evaluation. Journal of Signal Processing Systems, 52, 75-94.
    [3] Ravi, S., Raghunathan, A., Kocher, P., & Hattangady, S. (2004). Security in embedded systems: Design challenges. ACM Transactions on Embedded Computing Systems (TECS), 3(3), 461-491.
    [4] Janakiraman, S., Thenmozhi, K., Rayappan, J. B. B., & Amirtharajan, R. (2018). Lightweight chaotic image encryption algorithm for real-time embedded system: Implementation and analysis on 32-bit microcontroller. Microprocessors and Microsystems, 56, 1-12.
    [5] He, J., Lu, M., & He, B. (2013). Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. arXiv preprint arXiv:1307.1955.
    [6] Tomé, D. G., Gubner, T., Raasveldt, M., Rozenberg, E., & Boncz, P. A. (2018, January). Optimizing Group-By and Aggregation using GPU-CPU Co-Processing. In ADMS@ VLDB (pp. 1-10).
    [7] LIU, G. F., LIU, Q., LI, B., TONG, X. L., & LIU, H. (2009). GPU/CPU co-processing parallel computation for seismic data processing in oil and gas exploration. Progress in Geophysics, 24(5), 1671-1678.
    [8] Olagoke, A. S., Ibrahim, H., & Teoh, S. S. (2020). Literature survey on multi-camera system and its application. IEEE Access, 8, 172892-172922.
    [9] Chang, T. H., & Gong, S. (2001, July). Tracking multiple people with a multi-camera system. In Proceedings 2001 ieee workshop on multi-object tracking (pp. 19-26). IEEE.
    [10] Nieto, R. M., & Sánchez, J. M. M. (2013, August). An automatic system for sports analytics in multi-camera tennis videos. In 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (pp. 438-442). IEEE.
    [11] Godaliyadda, D., Garud, H., Poddar, D., Nagori, S., & Pande, T. (2021). Efficient Visual Localization on TDA4VM.
    [12] Prakash, A., Amrouch, H., Shafique, M., Mitra, T., & Henkel, J. (2016, June). Improving mobile gaming performance through cooperative CPU-GPU thermal management. In Proceedings of the 53rd annual design automation conference (pp. 1-6).
    [13] Lee, C., Ro, W. W., & Gaudiot, J. L. (2012, February). Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids. In 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT) (pp. 33-40). IEEE.
    [14] Hariharan, J., Varior, R. R., & Karunakaran, S. (2023). Real-time Driver Monitoring Systems on Edge AI Device. arXiv preprint arXiv:2304.01555.
    [15] Yao, P., An, H., Xu, M., Liu, G., Li, X., Wang, Y., & Han, W. (2010, June). CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application. In 2010 International Conference on High Performance Computing & Simulation (pp. 24-30). IEEE.
    [16] NVIDIA Jetson Orin. (n.d.). https://developer.nvidia.com/embedded/jetson-orin
    [17] Lee, C., Ro, W. W., & Gaudiot, J. L. (2012, February). Cooperative heterogeneous computing for parallel processing on CPU/GPU hybrids. In 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT) (pp. 33-40). IEEE.
    [18] Guide, D. (2013). Cuda c programming guide. NVIDIA, July, 29, 31.
    [19] NVIDIA Nsight Systems. (2023, June 27). NVIDIA Developer. https://developer.nvidia.com/nsight-systems
    [20] NVIDIA Nsight Compute. (2023, June 27). NVIDIA Developer. https://developer.nvidia.com/nsight-compute
    [21] Landaverde, R., Zhang, T., Coskun, A. K., & Herbordt, M. (2014, September). An investigation of unified memory access performance in CUDA. In 2014 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-6). IEEE.
    [22] Unified Memory in CUDA 6 | NVIDIA Technical Blog. (2022, August 21). NVIDIA Technical Blog. https://developer.nvidia.com/blog/unified-memory-in-cuda-6/

    無法下載圖示 全文公開日期 2028/07/23 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE