CORE-MAP: 多核邊緣設備上加速 CNN 推理的分佈式特徵圖處理

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳家儀 Chia-Yi Chen
論文名稱：	CORE-MAP: 多核邊緣設備上加速 CNN 推理的分佈式特徵圖處理 CORE-MAP: Feature Map Distributed Processing for Accelerating CNN Inference on Multi-Core Edge Devices
指導教授：	陳雅淑 Ya-Shu Chen
口試委員:	謝仁偉 Jen-Wei Hsieh 吳晉賢 Chin-Hsien Wu 曾學文 Hsueh-Wen Tseng
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	40
中文關鍵詞：	邊緣運算、分散式推理、神經網路
外文關鍵詞：	Edge computing, Distributed inference, Neural networks
相關次數：	點閱：134 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

分散式邊緣運算相較於雲端運算而言能夠提供較低的傳輸開銷和更高的隱私性，使用上變得越來越受歡迎。然而，每個邊緣裝置的有限運算能力以及對神經網路的高計算需求，使得分散式邊緣運算變得更加困難。在本研究中，我們探討了分散式邊緣推論中的模型裡的層分配、特徵圖切割和運算資源分配。接著，我們提出了名為CORE-MAP的方法，該方法考慮了資料相依性和資源利用情況，將給定的神經網路分配至一組邊緣裝置中。我們對所提出的CORE-MAP進行了評估，實驗結果顯示，與非分散式方法相比，CORE-MAP的性能提升達到了283%。

Distributed edge computing is becoming popular for providing reduced transmission overhead and privacy compared to cloud computing. However, each edge device's limited computing power and the high demand for neural networks make distributed edge computing more difficult. In this study, we explore the layer partition, feature map partition, and computing resources partition in the distributed edge inference. We then propose CORE-MAP, which distributes the given neural network to a set of edge devices with data dependency and resource utilization considerations. The proposed CORE-MAP is evaluated, and experimental outcomes indicate that the CORE-MAP achieves performance enhancement of 283% than the non-distribution approach.

Table of Contents
Introduction                                                      1
System Model                                                      3
Related Work                                                      6
Approach                                                             11
1 Response Time Analyze                                     11
2 Search Device Partition                                     12
2.1 Determine the number of layers of distribution     12
2.2 Split feature map to devices                             16
3 Search Core Partition                                     19
Performance Evaluation                                             22
1 Experimental Environment                                     22
2 Experimental Result                                             23
Conclusion                                                     27
References                                                             27
                                

References
[1] C.-Y. Yang, J.-J. Kuo, J.-P. Sheu, and K.-J. Zheng, “Cooperative distributed deep neural network deployment with edge computing,” in ICC 2021-IEEE International Conference on Communications, pp. 1–6, IEEE, 2021.
[2] T. Mohammed, C. Joe-Wong, R. Babbar, and M. Di Francesco, “Distributed infer- ence acceleration with adaptive dnn partitioning and offloading,” in IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 854–863, IEEE, 2020.
[3] N. Shan, Z. Ye, and X. Cui, “Collaborative intelligence: Accelerating deep neural network inference via device-edge synergy,” Security and Communication Networks, vol. 2020, pp. 1–10, 2020.
[4] H. Liu, H. Zheng, M. Jiao, and G. Chi, “Scads: Simultaneous computing and distri- bution strategy for task offloading in mobile-edge computing system,” in 2018 IEEE 18th International Conference on Communication Technology (ICCT), pp. 1286– 1290, IEEE, 2018.
[5] Q. Li, L. Huang, Z. Tong, T.-T. Du, J. Zhang, and S.-C. Wang, “Dissec: A distributed deep neural network inference scheduling strategy for edge clusters,” Neurocomput- ing, vol. 500, pp. 449–460, 2022.
[6] H. Zhou, W. Zhang, C. Wang, X. Ma, and H. Yu, “Bbnet: a novel convolutional neural network structure in edge-cloud collaborative inference,” Sensors, vol. 21, no. 13, p. 4494, 2021.
[7] M. Xue, H. Wu, R. Li, M. Xu, and P. Jiao, “Eosdnn: An efficient offloading scheme for dnn inference acceleration in local-edge-cloud collaborative environments,” IEEE Transactions on Green Communications and Networking, vol. 6, no. 1, pp. 248–264, 2021.
[8] S. Kum, S. Oh, J. Yeom, and J. Moon, “Optimization of edge resources for deep learning application with batch and model management,” Sensors, vol. 22, no. 17, p. 6717, 2022.
[9] H.-J. Jeong, H.-J. Lee, C. H. Shin, and S.-M. Moon, “Ionn: Incremental offloading of neural network computations from mobile devices to edge servers,” in Proceedings of the ACM symposium on cloud computing, pp. 401–411, 2018.
[10] H. Liu, W. Zheng, L. Li, and M. Guo, “Loadpart: Load-aware dynamic partition of deep neural networks for edge offloading,” in 2022 IEEE 42nd International Confer- ence on Distributed Computing Systems (ICDCS), pp. 481–491, IEEE, 2022.
[11] J. Wu, L. Wang, Q. Pei, X. Cui, F. Liu, and T. Yang, “Hitdl: High-throughput deep learning inference at the hybrid mobile edge,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4499–4514, 2022.
[12] R. Yang, Y. Li, H. He, and W. Zhang, “Dnn real-time collaborative inference accelera- tion with mobile edge computing,” in 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–08, IEEE, 2022.
[13] J. Mao, X. Chen, K. W. Nixon, C. Krieger, and Y. Chen, “Modnn: Local distributed mobile computing system for deep neural network,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pp. 1396–1401, IEEE, 2017.
[14] Z. Zhao, K. M. Barijough, and A. Gerstlauer, “Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2348– 2359, 2018.
[15] R. Stahl, A. Hoffman, D. Mueller-Gritschneder, A. Gerstlauer, and U. Schlicht- mann, “Deeperthings: Fully distributed cnn inference on resource-constrained edge devices,” International Journal of Parallel Programming, vol. 49, pp. 600–624, 2021.
[16] L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Transactions on Networking, vol. 29, no. 2, pp. 595–608, 2020.
[17] J. Mao, Z. Yang, W. Wen, C. Wu, L. Song, K. W. Nixon, X. Chen, H. Li, and Y. Chen, “Mednn: A distributed mobile system with enhanced partition and deployment for large-scale dnns,” in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 751–756, IEEE, 2017.
[18] J. Du, Y. Du, D. Huang, Y. Lu, and X. Liao, “Enhancing distributed in-situ cnn in- ference in the internet of things,” IEEE Internet of Things Journal, vol. 9, no. 17, pp. 15511–15524, 2022.
[19] S. Zhang, S. Zhang, Z. Qian, J. Wu, Y. Jin, and S. Lu, “Deepslicing: collaborative and adaptive cnn inference with low latency,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 9, pp. 2175–2187, 2021.
[20] J. Zhang, Y. Wang, T. Huang, F. Dong, W. Zhao, and D. Shen, “Thermal-aware on- device inference using single-layer parallelization with heterogeneous processors,” Tsinghua Science and Technology, vol. 28, no. 1, pp. 82–92, 2022.
[21] A. Parthasarathy and B. Krishnamachari, “Defer: Distributed edge inference for deep neural networks,” in 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), pp. 749–753, IEEE, 2022.
[22] Y. Xiang and H. Kim, “Pipelined data-parallel cpu/gpu scheduling for multi-dnn real- time inference,” in 2019 IEEE Real-Time Systems Symposium (RTSS), pp. 392–405, IEEE, 2019.
[23] L. Tang, Y. Wang, T. L. Willke, and K. Li, “Scheduling computation graphs of deep learning models on manycore cpus,” arXiv preprint arXiv:1807.09667, 2018.
[24] D. Justus, J. Brennan, S. Bonner, and A. S. McGough, “Predicting the computational cost of deep learning models,” in 2018 IEEE international conference on big data (Big Data), pp. 3873–3882, IEEE, 2018.
[25] A. Mishra, S. Chheda, C. Soto, A. M. Malik, M. Lin, and B. Chapman, “Compoff: A compiler cost model using machine learning to predict the cost of openmp offloading,” in 2022 IEEE International Parallel and Distributed Processing Symposium Work- shops (IPDPSW), pp. 391–400, IEEE, 2022.
[26] Y. G. Kim and C.-J. Wu, “Autoscale: Energy efficiency optimization for stochastic edge inference using reinforcement learning,” in 2020 53rd Annual IEEE/ACM inter- national symposium on microarchitecture (MICRO), pp. 1082–1096, IEEE, 2020.
[27] X. Hou, Y. Guan, T. Han, and N. Zhang, “Distredge: Speeding up convolutional neu- ral network inference on distributed edge devices,” in 2022 IEEE International Par- allel and Distributed Processing Symposium (IPDPS), pp. 1097–1107, IEEE, 2022.
[28] J. Wang, J. Hu, G. Min, W. Zhan, Q. Ni, and N. Georgalas, “Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning,” IEEE Communications Magazine, vol. 57, no. 5, pp. 64–69, 2019.
[29] “Jetson nano developer kit.” https://www.nvidia.com/en-us/ autonomous-machines/embedded-systems/jetson-nano/.
[30] “Raspberry pi 4 model b.” https://www.raspberrypi.com/products/ raspberry-pi-4-model-b/.
[31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[32] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: In- verted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.
[33] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the in- ception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.

全文公開日期 2028/08/30 (校內網路)
全文公開日期 2028/08/30 (校外網路)
全文公開日期 2028/08/30 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文