研究生: |
黃竟鎧 Jing-Kai Huang |
---|---|
論文名稱: |
多圖形處理器之高速離散元素模擬引擎 Multi-GPU High-Speed Simulation Engine based on DEM |
指導教授: |
謝佑明
Yo-Ming Hsieh |
口試委員: |
陳鴻銘
Hung-Ming Chen 楊元森 Yuan-Sen Yang 王國隆 Kuo-Lung Wang 謝佑明 Yo-Ming Hsieh |
學位類別: |
碩士 Master |
系所名稱: |
工程學院 - 營建工程系 Department of Civil and Construction Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 111 |
中文關鍵詞: | 離散元素法 、碰撞偵測 、GPGPU 、平行運算 、多GPU 、雜湊搜尋法 |
外文關鍵詞: | Discrete Element Method, Contact Detection, GPGPU, Parallel Computing, Multi-GPU, Hashing Search |
相關次數: | 點閱:339 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
離散元素法(Discrete Element Method, DEM)是一種大地工程領域應用於岩石與砂土材料的微觀力學性質探討,以及山崩土石流模擬等問題的數值分析方法。DEM透過採用精細的時間步驟並計算物理關係,可求得離散元素於不同時間點之狀態,屬於科學計算領域(scientific computing)。然而,採用極短的時間步驟求取高準確度以及離散元素法中重複執行碰撞偵測演算法,造成了計算量大增、運算時間過長之窘境。此外,當採用GPU加速運算時,為了進行更精密或更大規模的模擬任務,離散元素數量需求之上升造成了記憶體不足的問題。
本研究為了解決運算時間過長、記憶體不足之問題,採用以下四種方法進行改善,其中包含:(1)採用動態時間步驟合理調整時間分割以減少循環步驟、(2)使用較高效率之碰撞偵測演算法、(3)採用友善記憶體之平行化雜湊搜尋法進行資料存取、(4)利用多個GPU之平行運算以擴增模擬規模。進行成果分析之後,證明了本研究採用之方法成功的提升了運算效率,並大幅的提升本研究程式可運算之離散元素數量。
除了提升本研究程式整體效能之外,本研究亦進行了程式功能之擴充,以協助實務上的應用。包含:(1)透過球體叢集製作不規則離散元素模型、(2)匯入Tiff格式資料迅速建構地形模型、(3)透過Google提供之Protocol Buffers進行二進制資料格式輸出。
Discrete Element Method(DEM)is a numerical method in scientific computing. It is used in the field of geotechnical engineering to explore the micromechanical properties of rock and soil materials, and to simulate landslide and debris flow. By using fine time-step and calculating the mechanical behavior of discrete elements, the physical state of discrete elements at different time can be obtained. However, using fine time discretization to perform DEM for a high accuracy results in repeated contact detection and thus a significant increase in calculation time. In addition, when using GPU for speed-up, high accuracy and large scale are necessary for high-fidelity simulations. They increase the demand for the number of discrete elements and cause the problem of out-of-memory.
This study uses the following methods to improve our research DEM program to solve the problems aforementioned. First, a dynamic time-step algorithm is adapted to adjust time division reasonably in order to reduce the number of time steps. Second, contact detection algorithms found in physics engine is employed for efficient contact detection. Third, a memory-friendly hashing search algorithm is developed for data access. Finally, multi-GPU computing is developed to expand the scale of simulation. It is shown that these methods have successfully improved the efficiency and greatly increased the number of discrete elements computable by our research program.
In addition to improving the overall performance, this study further added functions necessary for practical applications. First, irregular discrete element is now possible through the clump mechanism. Second, the program can now import TIFF formatted digital surface models to conveniently build terrain models. Finally, the simulation data is now saved using Google’s Protocol Buffers to decrease time for data serialization.
[1] 吳騏宇 (2018)。 超高速離散元素分析引擎之研發。碩士論文。國立臺灣科技大學, 台北市。
[2] Eberly, D. H. (2010). Game Physics: CRC Press.
[3] Jones, M. T. (2011). Open source physics engines. Retrieved May 8, 2020, from https://www.ibm.com/developerworks/opensource/library/os-physicsengines/
[4] Boeing, A., Bräunl, T. (2007). Evaluation of real-time physics simulation systems. Paper presented at the Proceedings of the 5th international conference on Computer graphics and interactive techniques in Australia and Southeast Asia, Perth, Australia.
[5] Oak Ridge National Laboratory. EARTH SCIENCE. Retrieved July 5, 2020, from https://www.olcf.ornl.gov/leadership-science/earth-science/
[6] 張賀翔 (2012)。應用分離元素法探討順向坡失穩歷程-以國道三號崩塌事件為例。碩士論文。國立臺北科技大學,台北市。
[7] Feldman, M. (2018). Summit Up and Running at Oak Ridge, Claims First Exascale Application. Retrieved July 5, 2020, from https://www.top500.org/news/summit-up-and-running-at-oak-ridge-claims-first-exascale-application/
[8] NVIDIA. (2018). NVIDIA PhysX SDK 4.0 Documentation. Retrieved December 22, 2019, from https://gameworksdocs.nvidia.com/PhysX/4.0/documentation/PhysXGuide/Index.html
[9] Goldsmith, J., Salmon, J. (1987). Automatic Creation of Object Hierarchies for Ray Tracing. IEEE Computer Graphics and Applications, 7(5), 14–20.
[10] Cohen, J. D., Lin, M. C., Manocha, D., Ponamgi, M. (1995). I-COLLIDE: an interactive and exact collision detection system for large-scale environments. Paper presented at the Proceedings of the 1995 symposium on Interactive 3D graphics, Monterey, California, USA.
[11] Karras, T. (2012). Thinking Parallel, Part II: Tree Traversal on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-ii-tree-traversal-gpu/
[12] Karras, T. (2012). Thinking Parallel, Part III: Tree Construction on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-iii-tree-construction-gpu/
[13] Liu, F., Harada, T., Lee, Y., Kim, Y. J. (2010). Real-time collision culling of a million bodies on graphics processing units. ACM Trans. Graph., 29(6), Article 154.
[14] Karras, T. (2012). Thinking Parallel, Part I: Collision Detection on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-i-collision-detection-gpu/
[15] Wald, I., Mark, W. R., Günther, J., Boulos, S., Ize, T., Hunt, W., Parker, S. G., Shirley, P. (2009). State of the Art in Ray Tracing Animated Scenes. 28(6), 1691-1722.
[16] Wong, T. H., Leach, G., Zambetta, F. (2014). An adaptive octree grid for GPU-based collision detection of deformable objects. The Visual Computer, 30(6), 729-738.
[17] Nguyen, H. (2007). GPU gems 3: Addison-Wesley Professional.
[18] Kloss, C., Goniva, C., Hager, A., Amberger, S., Pirker, S. (2012). Models, algorithms and validation for opensource DEM and CFD-DEM. Progress in Computational Fluid Dynamics, 12, 140-152.
[19] CFDEM®project. (2013). Open-Source DEM Simulations in an HPC Environment. Retrieved May 9, 2020, from https://www.cfdem.com/system/files/slides_for_dcs_users_group_meeting_no_ci.pdf
[20] Sandia. LAMMPS Molecular Dynamics Simulator. Retrieved July 7, 2020, from https://lammps.sandia.gov/
[21] Plimpton, S. (1995). Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics, 117(1), 1-19.
[22] Sandia. (2020). LAMMPS Documentation. Retrieved May 10, 2020, from https://lammps.sandia.gov/doc/Manual.html
[23] Sandia. (2014). Accelerator benchmarks for CPU, GPU, KNL. Retrieved May 9, 2020, from https://lammps.sandia.gov/bench.html#accelerator
[24] ESSS. Rocky DEM. Retrieved July 7, 2020, from https://rocky.esss.co/
[25] Bharadwaj, R. (2017). Rocky 4 with Multi-GPU: which hardware is best for you? Retrieved February 12, 2020, from https://rocky.esss.co/blog/rocky-4-with-multi-gpu-which-hardware-is-best-for-you/
[26] EDEM. (2016). EDEM 2017 User Guide. Copyright © 2016.
[27] Bruce, C. (2019). 10 common questions about EDEM GPU answered. Retrieved July 15, 2020, from https://www.edemsimulation.com/blog-and-news/blog/10-common-questions-about-edem-gpu-answered/
[28] Kendig, K. (2019). THE NEED FOR SPEED: NEW IFORGE GPUS ENABLE FASTER MODELS, SIMULATIONS. Retrieved July 16, 2020, from http://www.ncsa.illinois.edu/news/story/the_need_for_speed_new_iforge_gpus_enable_faster_models_simulations
[29] Heroux, M. A., Raghavan, P., Simon, H. D. (2006). Parallel Processing for Scientific Computing (Software, Environments and Tools): Society for Industrial and Applied Mathematics.
[30] Barney, B. Introduction to Parallel Computing. Retrieved May 8, 2020, from https://computing.llnl.gov/tutorials/parallel_comp/#Whatis
[31] NVIDIA. (2009). Whitepaper NVIDIA’s Next Generation CUDATM Compute Architecture: FermiTM: NVIDIA Corporation.
[32] Feng, Z., Li, P. (2008). Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms. Paper presented at the Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, San Jose, California.
[33] NVIDIA. (2020). NVIDIA CUDA programming guide. Retrieved May 10, 2020, from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture
[34] Maurer, W. D., Lewis, T. G. (1975). Hash Table Methods. ACM Comput. Surv., 7(1), 5–19.
[35] Garg, P. Basics of Hash Tables. Retrieved May 10, 2020, from https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/
[36] Kulukundis, M. (2017). Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step. Paper presented at the CppCon, Bellevue, Washington. https://www.youtube.com/watch?v=ncHmEUmJZf4&fbclid=IwAR2ABofR5Z2SoaMe6WjUQt-rMP6Z9qBcEZDfcPGPYDlR-KDcbEMh9tVwOX0
[37] Gouy, I. (2020). Which programming language is fastest? Retrieved May 6, 2020, from https://benchmarksgame-team.pages.debian.net/benchmarksgame/
[38] Khronos™_Group. (2008). The Khronos Group Releases OpenCL 1.0 Specification. Retrieved July 7, 2020, from https://web.archive.org/web/20100713014204/http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/
[39] NVIDIA. (2018). PhysX SDK. from https://developer.nvidia.com/physx-sdk
[40] Wikipedia. (6 May 2020 13:11 UTC). Physics processing unit. Retrieved July 7, 2020, from https://en.wikipedia.org/w/index.php?title=Physics_processing_unit&oldid=955193337
[41] NVIDIA. (2018). PhysX SDK 4.0 Available Now. Retrieved July 7, 2020, from https://news.developer.nvidia.com/announcing-physx-sdk-4-0-an-open-source-physics-engine/
[42] Coumans, E. (2013) Bullet physics library. Open source: bulletphysics. org Retrieved July 7, 2020, from https://pybullet.org/wordpress/
[43] Wikipedia. (2020). Bullet (software). Retrieved July 7 2020, from https://en.wikipedia.org/w/index.php?title=Bullet_(software)&oldid=956186268
[44] Google. Bullet Physics SDK. Retrieved July 9, 2020, from https://opensource.google/projects/bullet3
[45] Lutz, K. (2013). Boost.Compute. Retrieved July 10, 2020, from https://www.boost.org/doc/libs/1_65_1/libs/compute/doc/html/index.html
[46] Szuppe, J. (2016). Boost.Compute: A parallel computing library for C++ based on OpenCL. Paper presented at the Proceedings of the 4th International Workshop on OpenCL, Vienna, Austria.
[47] Benoît, J., Guennebaud, G. Eigen v3. Retrieved July 7, 2020, from http://eigen.tuxfamily.org
[48] CGAL. Computational Geometry Algorithms Library. Retrieved July 7, 2020, from https://www.cgal.org/
[49] The CGAL Project. (2020). CGAL User and Reference Manual. CGAL Editorial Board , 5.0.3 edition, 2020.
[50] Google. Protocol Buffers. Google Inc. Retrieved July 10, 2020, from https://developers.google.com/protocol-buffers
[51] Itasca Consulting Group. (2020). Itasca’s Particle Flow Code Documentation: USA: Itasca Consulting Group Inc.
[52] NVIDIA. NVIDIA System Management Interface. Retrieved May 11, 2020, from https://developer.nvidia.com/nvidia-system-management-interface
[53] Taghavi, R. (2011). Automatic clump generation based on mid-surface. Paper presented at the Continuum and Distinct Element Numerical Modeling in Geomechanics-2011, Melbourne, Australia.
[54] Wikipedia. (2020). Delaunay triangulation. Retrieved July 6, 2020, from https://en.wikipedia.org/w/index.php?title=Delaunay_triangulation&oldid=951514115
[55] Jonathan, B., Atman, J. B. (2004). How to find the inertia tensor (or other mass properties) of a 3D solid body represented by a triangle mesh.
[56] Fogleman, M. Delatin demo. Retrieved July 7, 2020, from https://mapbox.github.io/delatin/
[57] Fogleman, M. (2019). Heightmap meshing utility. Retrieved July 7, 2020, from https://github.com/fogleman/hmm
[58] Agafonkin, M. F. V. (2019). A fast JavaScript terrain mesh generation tool based on Delaunay triangulation. Retrieved July 7, 2020, from https://github.com/mapbox/delatin
[59] Bathe, K. J., Wilson, E. L. (1976). Numerical Methods in Finite Element Analysis: Prentice-Hall.