簡易檢索 / 詳目顯示

研究生: 陳家儀
Chia-Yi Chen
論文名稱: CORE-MAP: 多核邊緣設備上加速 CNN 推理的分佈式特徵圖處理
CORE-MAP: Feature Map Distributed Processing for Accelerating CNN Inference on Multi-Core Edge Devices
指導教授: 陳雅淑
Ya-Shu Chen
口試委員: 謝仁偉
Jen-Wei Hsieh
吳晉賢
Chin-Hsien Wu
曾學文
Hsueh-Wen Tseng
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 40
中文關鍵詞: 邊緣運算分散式推理神經網路
外文關鍵詞: Edge computing, Distributed inference, Neural networks
相關次數: 點閱:134下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 分散式邊緣運算相較於雲端運算而言能夠提供較低的傳輸開銷和更高的隱私性,使用上變得越來越受歡迎。然而,每個邊緣裝置的有限運算能力以及對神經網路的高計算需求,使得分散式邊緣運算變得更加困難。在本研究中,我們探討了分散式邊緣推論中的模型裡的層分配、特徵圖切割和運算資源分配。接著,我們提出了名為CORE-MAP的方法,該方法考慮了資料相依性和資源利用情況,將給定的神經網路分配至一組邊緣裝置中。我們對所提出的CORE-MAP進行了評估,實驗結果顯示,與非分散式方法相比,CORE-MAP的性能提升達到了283%。


    Distributed edge computing is becoming popular for providing reduced transmission overhead and privacy compared to cloud computing. However, each edge device's limited computing power and the high demand for neural networks make distributed edge computing more difficult. In this study, we explore the layer partition, feature map partition, and computing resources partition in the distributed edge inference. We then propose CORE-MAP, which distributes the given neural network to a set of edge devices with data dependency and resource utilization considerations. The proposed CORE-MAP is evaluated, and experimental outcomes indicate that the CORE-MAP achieves performance enhancement of 283% than the non-distribution approach.

    Table of Contents 1 Introduction 1 2 System Model 3 3 Related Work 6 4 Approach 11 4.1 Response Time Analyze 11 4.2 Search Device Partition 12 4.2.1 Determine the number of layers of distribution 12 4.2.2 Split feature map to devices 16 4.3 Search Core Partition 19 5 Performance Evaluation 22 5.1 Experimental Environment 22 5.2 Experimental Result 23 6 Conclusion 27 References 27

    References
    [1] C.-Y. Yang, J.-J. Kuo, J.-P. Sheu, and K.-J. Zheng, “Cooperative distributed deep neural network deployment with edge computing,” in ICC 2021-IEEE International Conference on Communications, pp. 1–6, IEEE, 2021.
    [2] T. Mohammed, C. Joe-Wong, R. Babbar, and M. Di Francesco, “Distributed infer- ence acceleration with adaptive dnn partitioning and offloading,” in IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 854–863, IEEE, 2020.
    [3] N. Shan, Z. Ye, and X. Cui, “Collaborative intelligence: Accelerating deep neural network inference via device-edge synergy,” Security and Communication Networks, vol. 2020, pp. 1–10, 2020.
    [4] H. Liu, H. Zheng, M. Jiao, and G. Chi, “Scads: Simultaneous computing and distri- bution strategy for task offloading in mobile-edge computing system,” in 2018 IEEE 18th International Conference on Communication Technology (ICCT), pp. 1286– 1290, IEEE, 2018.
    [5] Q. Li, L. Huang, Z. Tong, T.-T. Du, J. Zhang, and S.-C. Wang, “Dissec: A distributed deep neural network inference scheduling strategy for edge clusters,” Neurocomput- ing, vol. 500, pp. 449–460, 2022.
    [6] H. Zhou, W. Zhang, C. Wang, X. Ma, and H. Yu, “Bbnet: a novel convolutional neural network structure in edge-cloud collaborative inference,” Sensors, vol. 21, no. 13, p. 4494, 2021.
    [7] M. Xue, H. Wu, R. Li, M. Xu, and P. Jiao, “Eosdnn: An efficient offloading scheme for dnn inference acceleration in local-edge-cloud collaborative environments,” IEEE Transactions on Green Communications and Networking, vol. 6, no. 1, pp. 248–264, 2021.
    [8] S. Kum, S. Oh, J. Yeom, and J. Moon, “Optimization of edge resources for deep learning application with batch and model management,” Sensors, vol. 22, no. 17, p. 6717, 2022.
    [9] H.-J. Jeong, H.-J. Lee, C. H. Shin, and S.-M. Moon, “Ionn: Incremental offloading of neural network computations from mobile devices to edge servers,” in Proceedings of the ACM symposium on cloud computing, pp. 401–411, 2018.
    [10] H. Liu, W. Zheng, L. Li, and M. Guo, “Loadpart: Load-aware dynamic partition of deep neural networks for edge offloading,” in 2022 IEEE 42nd International Confer- ence on Distributed Computing Systems (ICDCS), pp. 481–491, IEEE, 2022.
    [11] J. Wu, L. Wang, Q. Pei, X. Cui, F. Liu, and T. Yang, “Hitdl: High-throughput deep learning inference at the hybrid mobile edge,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4499–4514, 2022.
    [12] R. Yang, Y. Li, H. He, and W. Zhang, “Dnn real-time collaborative inference accelera- tion with mobile edge computing,” in 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–08, IEEE, 2022.
    [13] J. Mao, X. Chen, K. W. Nixon, C. Krieger, and Y. Chen, “Modnn: Local distributed mobile computing system for deep neural network,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pp. 1396–1401, IEEE, 2017.
    [14] Z. Zhao, K. M. Barijough, and A. Gerstlauer, “Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2348– 2359, 2018.
    [15] R. Stahl, A. Hoffman, D. Mueller-Gritschneder, A. Gerstlauer, and U. Schlicht- mann, “Deeperthings: Fully distributed cnn inference on resource-constrained edge devices,” International Journal of Parallel Programming, vol. 49, pp. 600–624, 2021.
    [16] L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Transactions on Networking, vol. 29, no. 2, pp. 595–608, 2020.
    [17] J. Mao, Z. Yang, W. Wen, C. Wu, L. Song, K. W. Nixon, X. Chen, H. Li, and Y. Chen, “Mednn: A distributed mobile system with enhanced partition and deployment for large-scale dnns,” in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 751–756, IEEE, 2017.
    [18] J. Du, Y. Du, D. Huang, Y. Lu, and X. Liao, “Enhancing distributed in-situ cnn in- ference in the internet of things,” IEEE Internet of Things Journal, vol. 9, no. 17, pp. 15511–15524, 2022.
    [19] S. Zhang, S. Zhang, Z. Qian, J. Wu, Y. Jin, and S. Lu, “Deepslicing: collaborative and adaptive cnn inference with low latency,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 9, pp. 2175–2187, 2021.
    [20] J. Zhang, Y. Wang, T. Huang, F. Dong, W. Zhao, and D. Shen, “Thermal-aware on- device inference using single-layer parallelization with heterogeneous processors,” Tsinghua Science and Technology, vol. 28, no. 1, pp. 82–92, 2022.
    [21] A. Parthasarathy and B. Krishnamachari, “Defer: Distributed edge inference for deep neural networks,” in 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), pp. 749–753, IEEE, 2022.
    [22] Y. Xiang and H. Kim, “Pipelined data-parallel cpu/gpu scheduling for multi-dnn real- time inference,” in 2019 IEEE Real-Time Systems Symposium (RTSS), pp. 392–405, IEEE, 2019.
    [23] L. Tang, Y. Wang, T. L. Willke, and K. Li, “Scheduling computation graphs of deep learning models on manycore cpus,” arXiv preprint arXiv:1807.09667, 2018.
    [24] D. Justus, J. Brennan, S. Bonner, and A. S. McGough, “Predicting the computational cost of deep learning models,” in 2018 IEEE international conference on big data (Big Data), pp. 3873–3882, IEEE, 2018.
    [25] A. Mishra, S. Chheda, C. Soto, A. M. Malik, M. Lin, and B. Chapman, “Compoff: A compiler cost model using machine learning to predict the cost of openmp offloading,” in 2022 IEEE International Parallel and Distributed Processing Symposium Work- shops (IPDPSW), pp. 391–400, IEEE, 2022.
    [26] Y. G. Kim and C.-J. Wu, “Autoscale: Energy efficiency optimization for stochastic edge inference using reinforcement learning,” in 2020 53rd Annual IEEE/ACM inter- national symposium on microarchitecture (MICRO), pp. 1082–1096, IEEE, 2020.
    [27] X. Hou, Y. Guan, T. Han, and N. Zhang, “Distredge: Speeding up convolutional neu- ral network inference on distributed edge devices,” in 2022 IEEE International Par- allel and Distributed Processing Symposium (IPDPS), pp. 1097–1107, IEEE, 2022.
    [28] J. Wang, J. Hu, G. Min, W. Zhan, Q. Ni, and N. Georgalas, “Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning,” IEEE Communications Magazine, vol. 57, no. 5, pp. 64–69, 2019.
    [29] “Jetson nano developer kit.” https://www.nvidia.com/en-us/ autonomous-machines/embedded-systems/jetson-nano/.
    [30] “Raspberry pi 4 model b.” https://www.raspberrypi.com/products/ raspberry-pi-4-model-b/.
    [31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [32] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: In- verted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.
    [33] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the in- ception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.

    無法下載圖示 全文公開日期 2028/08/30 (校內網路)
    全文公開日期 2028/08/30 (校外網路)
    全文公開日期 2028/08/30 (國家圖書館:臺灣博碩士論文系統)
    QR CODE