簡易檢索 / 詳目顯示

研究生: 黃竟鎧
Jing-Kai Huang
論文名稱: 多圖形處理器之高速離散元素模擬引擎
Multi-GPU High-Speed Simulation Engine based on DEM
指導教授: 謝佑明
Yo-Ming Hsieh
口試委員: 陳鴻銘
Hung-Ming Chen
楊元森
Yuan-Sen Yang
王國隆
Kuo-Lung Wang
謝佑明
Yo-Ming Hsieh
學位類別: 碩士
Master
系所名稱: 工程學院 - 營建工程系
Department of Civil and Construction Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 111
中文關鍵詞: 離散元素法碰撞偵測GPGPU平行運算多GPU雜湊搜尋法
外文關鍵詞: Discrete Element Method, Contact Detection, GPGPU, Parallel Computing, Multi-GPU, Hashing Search
相關次數: 點閱:339下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 離散元素法(Discrete Element Method, DEM)是一種大地工程領域應用於岩石與砂土材料的微觀力學性質探討,以及山崩土石流模擬等問題的數值分析方法。DEM透過採用精細的時間步驟並計算物理關係,可求得離散元素於不同時間點之狀態,屬於科學計算領域(scientific computing)。然而,採用極短的時間步驟求取高準確度以及離散元素法中重複執行碰撞偵測演算法,造成了計算量大增、運算時間過長之窘境。此外,當採用GPU加速運算時,為了進行更精密或更大規模的模擬任務,離散元素數量需求之上升造成了記憶體不足的問題。
    本研究為了解決運算時間過長、記憶體不足之問題,採用以下四種方法進行改善,其中包含:(1)採用動態時間步驟合理調整時間分割以減少循環步驟、(2)使用較高效率之碰撞偵測演算法、(3)採用友善記憶體之平行化雜湊搜尋法進行資料存取、(4)利用多個GPU之平行運算以擴增模擬規模。進行成果分析之後,證明了本研究採用之方法成功的提升了運算效率,並大幅的提升本研究程式可運算之離散元素數量。
    除了提升本研究程式整體效能之外,本研究亦進行了程式功能之擴充,以協助實務上的應用。包含:(1)透過球體叢集製作不規則離散元素模型、(2)匯入Tiff格式資料迅速建構地形模型、(3)透過Google提供之Protocol Buffers進行二進制資料格式輸出。


    Discrete Element Method(DEM)is a numerical method in scientific computing. It is used in the field of geotechnical engineering to explore the micromechanical properties of rock and soil materials, and to simulate landslide and debris flow. By using fine time-step and calculating the mechanical behavior of discrete elements, the physical state of discrete elements at different time can be obtained. However, using fine time discretization to perform DEM for a high accuracy results in repeated contact detection and thus a significant increase in calculation time. In addition, when using GPU for speed-up, high accuracy and large scale are necessary for high-fidelity simulations. They increase the demand for the number of discrete elements and cause the problem of out-of-memory.
    This study uses the following methods to improve our research DEM program to solve the problems aforementioned. First, a dynamic time-step algorithm is adapted to adjust time division reasonably in order to reduce the number of time steps. Second, contact detection algorithms found in physics engine is employed for efficient contact detection. Third, a memory-friendly hashing search algorithm is developed for data access. Finally, multi-GPU computing is developed to expand the scale of simulation. It is shown that these methods have successfully improved the efficiency and greatly increased the number of discrete elements computable by our research program.
    In addition to improving the overall performance, this study further added functions necessary for practical applications. First, irregular discrete element is now possible through the clump mechanism. Second, the program can now import TIFF formatted digital surface models to conveniently build terrain models. Finally, the simulation data is now saved using Google’s Protocol Buffers to decrease time for data serialization.

    論文摘要 I ABSTRACT II 誌謝 III 目錄 IV 圖目錄 VIII 表目錄 XI 第壹章 緒論 1 1.1 研究動機與目的 1 1.2 研究流程 2 1.3 論文架構 3 第貳章 文獻回顧 4 2.1 物理引擎 4 2.2 碰撞偵測 5 2.2.1 Broad-phase 6 2.2.1.1 Bounding Volume Hierarchy 7 2.2.1.2 Sweep-And-Prune 8 2.2.1.3 Spatial Subdivision 9 2.3 離散元素法模擬軟體 10 2.4 GPGPU平行運算 13 2.5 雜湊 15 2.5.1高效雜湊表 16 第參章 研究方法與工具 18 3.1 程式開發工具 18 3.1.1 OpenCL 19 3.1.2開放原始碼程式庫 19 3.1.2.1 PhysX 4.0 19 3.1.2.2 Bullet 20 3.1.2.3 Boost.Compute 20 3.1.2.4 Eigen 21 3.1.2.5 CGAL 21 3.1.3資料存取工具 22 3.1.3.1 Protocol Buffers 22 3.1.3.2 Protocol Buffers與文本格式之比較 24 3.2 系統驗證方法 25 3.3 效能評估與剖析方法 28 第肆章 系統分析與實作 29 4.1程式整體架構與執行流程 29 4.1.1 程式架構 29 4.1.2 程式執行流程 31 4.1.2.1 整體架構流程 31 4.1.2.2 模擬任務流程 33 4.2程式架構 34 4.2.1 離散元素模型 35 4.2.1.1 球體叢集三維模型製作 35 4.2.1.2 球體叢集物理參數計算 37 4.2.2 地形模型 38 4.2.3 動態時間步驟 39 4.2.3.1 實作及演算法 39 4.2.3.2 驗證 39 4.2.4 碰撞偵測演算法 42 4.2.4.1 CPU實作 42 4.2.4.2 GPU實作 42 4.2.5 碰撞平面搜尋演算法 43 4.2.5.1 碰撞平面容器之縮減 44 4.2.5.2 雜湊表之壓縮 45 4.2.5.3 效能比較 47 4.2.6 Multi-GPU協作運算 48 4.2.6.1 Prepare GPU context 48 4.2.6.2 Initialize GPU engine module 49 4.2.6.3 Produce engine data 49 4.2.6.4 模擬任務 50 4.2.7 輕量化資料格式 51 4.2.8 小結 52 第伍章 系統驗證 53 5.1 球體角速度作用於牆 53 5.2 球體角速度作用於球體所組成之牆 55 5.3 懸臂梁 55 5.4 抗壓試驗 57 5.5 球體叢集於平面之滑動 58 5.6 球體叢集於空間之旋轉 61 5.7 球體叢集於斜坡之滾動 63 5.8 小結 65 第陸章 效能評估與剖析 66 6.1 效能評估案例與方法 66 6.2 CPU進行離散元素分析之效能評估 68 6.3 GPU進行離散元素分析之效能評估 69 6.3.1 單精度浮點數測試 71 6.3.2 雙精度浮點數測試 74 6.3.3 GPU運算效能分析 77 6.4 GPU效能剖析 79 6.4.1 Naïve演算法 79 6.4.2 BVH演算法 80 6.5整體運算效率比較 82 6.6小結 84 第柒章 案例展示 85 7.1 抗壓試驗 85 7.2 地形及不規則模型模擬 87 第捌章 結論與建議 89 8.1 結論 89 8.2 建議與未來展望 90 參考文獻 92

    [1] 吳騏宇 (2018)。 超高速離散元素分析引擎之研發。碩士論文。國立臺灣科技大學, 台北市。
    [2] Eberly, D. H. (2010). Game Physics: CRC Press.
    [3] Jones, M. T. (2011). Open source physics engines. Retrieved May 8, 2020, from https://www.ibm.com/developerworks/opensource/library/os-physicsengines/
    [4] Boeing, A., Bräunl, T. (2007). Evaluation of real-time physics simulation systems. Paper presented at the Proceedings of the 5th international conference on Computer graphics and interactive techniques in Australia and Southeast Asia, Perth, Australia.
    [5] Oak Ridge National Laboratory. EARTH SCIENCE. Retrieved July 5, 2020, from https://www.olcf.ornl.gov/leadership-science/earth-science/
    [6] 張賀翔 (2012)。應用分離元素法探討順向坡失穩歷程-以國道三號崩塌事件為例。碩士論文。國立臺北科技大學,台北市。
    [7] Feldman, M. (2018). Summit Up and Running at Oak Ridge, Claims First Exascale Application. Retrieved July 5, 2020, from https://www.top500.org/news/summit-up-and-running-at-oak-ridge-claims-first-exascale-application/
    [8] NVIDIA. (2018). NVIDIA PhysX SDK 4.0 Documentation. Retrieved December 22, 2019, from https://gameworksdocs.nvidia.com/PhysX/4.0/documentation/PhysXGuide/Index.html
    [9] Goldsmith, J., Salmon, J. (1987). Automatic Creation of Object Hierarchies for Ray Tracing. IEEE Computer Graphics and Applications, 7(5), 14–20.
    [10] Cohen, J. D., Lin, M. C., Manocha, D., Ponamgi, M. (1995). I-COLLIDE: an interactive and exact collision detection system for large-scale environments. Paper presented at the Proceedings of the 1995 symposium on Interactive 3D graphics, Monterey, California, USA.
    [11] Karras, T. (2012). Thinking Parallel, Part II: Tree Traversal on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-ii-tree-traversal-gpu/
    [12] Karras, T. (2012). Thinking Parallel, Part III: Tree Construction on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-iii-tree-construction-gpu/
    [13] Liu, F., Harada, T., Lee, Y., Kim, Y. J. (2010). Real-time collision culling of a million bodies on graphics processing units. ACM Trans. Graph., 29(6), Article 154.
    [14] Karras, T. (2012). Thinking Parallel, Part I: Collision Detection on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-i-collision-detection-gpu/
    [15] Wald, I., Mark, W. R., Günther, J., Boulos, S., Ize, T., Hunt, W., Parker, S. G., Shirley, P. (2009). State of the Art in Ray Tracing Animated Scenes. 28(6), 1691-1722.
    [16] Wong, T. H., Leach, G., Zambetta, F. (2014). An adaptive octree grid for GPU-based collision detection of deformable objects. The Visual Computer, 30(6), 729-738.
    [17] Nguyen, H. (2007). GPU gems 3: Addison-Wesley Professional.
    [18] Kloss, C., Goniva, C., Hager, A., Amberger, S., Pirker, S. (2012). Models, algorithms and validation for opensource DEM and CFD-DEM. Progress in Computational Fluid Dynamics, 12, 140-152.
    [19] CFDEM®project. (2013). Open-Source DEM Simulations in an HPC Environment. Retrieved May 9, 2020, from https://www.cfdem.com/system/files/slides_for_dcs_users_group_meeting_no_ci.pdf
    [20] Sandia. LAMMPS Molecular Dynamics Simulator. Retrieved July 7, 2020, from https://lammps.sandia.gov/
    [21] Plimpton, S. (1995). Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics, 117(1), 1-19.
    [22] Sandia. (2020). LAMMPS Documentation. Retrieved May 10, 2020, from https://lammps.sandia.gov/doc/Manual.html
    [23] Sandia. (2014). Accelerator benchmarks for CPU, GPU, KNL. Retrieved May 9, 2020, from https://lammps.sandia.gov/bench.html#accelerator
    [24] ESSS. Rocky DEM. Retrieved July 7, 2020, from https://rocky.esss.co/
    [25] Bharadwaj, R. (2017). Rocky 4 with Multi-GPU: which hardware is best for you? Retrieved February 12, 2020, from https://rocky.esss.co/blog/rocky-4-with-multi-gpu-which-hardware-is-best-for-you/
    [26] EDEM. (2016). EDEM 2017 User Guide. Copyright © 2016.
    [27] Bruce, C. (2019). 10 common questions about EDEM GPU answered. Retrieved July 15, 2020, from https://www.edemsimulation.com/blog-and-news/blog/10-common-questions-about-edem-gpu-answered/
    [28] Kendig, K. (2019). THE NEED FOR SPEED: NEW IFORGE GPUS ENABLE FASTER MODELS, SIMULATIONS. Retrieved July 16, 2020, from http://www.ncsa.illinois.edu/news/story/the_need_for_speed_new_iforge_gpus_enable_faster_models_simulations
    [29] Heroux, M. A., Raghavan, P., Simon, H. D. (2006). Parallel Processing for Scientific Computing (Software, Environments and Tools): Society for Industrial and Applied Mathematics.
    [30] Barney, B. Introduction to Parallel Computing. Retrieved May 8, 2020, from https://computing.llnl.gov/tutorials/parallel_comp/#Whatis
    [31] NVIDIA. (2009). Whitepaper NVIDIA’s Next Generation CUDATM Compute Architecture: FermiTM: NVIDIA Corporation.
    [32] Feng, Z., Li, P. (2008). Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms. Paper presented at the Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, San Jose, California.
    [33] NVIDIA. (2020). NVIDIA CUDA programming guide. Retrieved May 10, 2020, from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture
    [34] Maurer, W. D., Lewis, T. G. (1975). Hash Table Methods. ACM Comput. Surv., 7(1), 5–19.
    [35] Garg, P. Basics of Hash Tables. Retrieved May 10, 2020, from https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/
    [36] Kulukundis, M. (2017). Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step. Paper presented at the CppCon, Bellevue, Washington. https://www.youtube.com/watch?v=ncHmEUmJZf4&fbclid=IwAR2ABofR5Z2SoaMe6WjUQt-rMP6Z9qBcEZDfcPGPYDlR-KDcbEMh9tVwOX0
    [37] Gouy, I. (2020). Which programming language is fastest? Retrieved May 6, 2020, from https://benchmarksgame-team.pages.debian.net/benchmarksgame/
    [38] Khronos™_Group. (2008). The Khronos Group Releases OpenCL 1.0 Specification. Retrieved July 7, 2020, from https://web.archive.org/web/20100713014204/http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/
    [39] NVIDIA. (2018). PhysX SDK. from https://developer.nvidia.com/physx-sdk
    [40] Wikipedia. (6 May 2020 13:11 UTC). Physics processing unit. Retrieved July 7, 2020, from https://en.wikipedia.org/w/index.php?title=Physics_processing_unit&oldid=955193337
    [41] NVIDIA. (2018). PhysX SDK 4.0 Available Now. Retrieved July 7, 2020, from https://news.developer.nvidia.com/announcing-physx-sdk-4-0-an-open-source-physics-engine/
    [42] Coumans, E. (2013) Bullet physics library. Open source: bulletphysics. org Retrieved July 7, 2020, from https://pybullet.org/wordpress/
    [43] Wikipedia. (2020). Bullet (software). Retrieved July 7 2020, from https://en.wikipedia.org/w/index.php?title=Bullet_(software)&oldid=956186268
    [44] Google. Bullet Physics SDK. Retrieved July 9, 2020, from https://opensource.google/projects/bullet3
    [45] Lutz, K. (2013). Boost.Compute. Retrieved July 10, 2020, from https://www.boost.org/doc/libs/1_65_1/libs/compute/doc/html/index.html
    [46] Szuppe, J. (2016). Boost.Compute: A parallel computing library for C++ based on OpenCL. Paper presented at the Proceedings of the 4th International Workshop on OpenCL, Vienna, Austria.
    [47] Benoît, J., Guennebaud, G. Eigen v3. Retrieved July 7, 2020, from http://eigen.tuxfamily.org
    [48] CGAL. Computational Geometry Algorithms Library. Retrieved July 7, 2020, from https://www.cgal.org/
    [49] The CGAL Project. (2020). CGAL User and Reference Manual. CGAL Editorial Board , 5.0.3 edition, 2020.
    [50] Google. Protocol Buffers. Google Inc. Retrieved July 10, 2020, from https://developers.google.com/protocol-buffers
    [51] Itasca Consulting Group. (2020). Itasca’s Particle Flow Code Documentation: USA: Itasca Consulting Group Inc.
    [52] NVIDIA. NVIDIA System Management Interface. Retrieved May 11, 2020, from https://developer.nvidia.com/nvidia-system-management-interface
    [53] Taghavi, R. (2011). Automatic clump generation based on mid-surface. Paper presented at the Continuum and Distinct Element Numerical Modeling in Geomechanics-2011, Melbourne, Australia.
    [54] Wikipedia. (2020). Delaunay triangulation. Retrieved July 6, 2020, from https://en.wikipedia.org/w/index.php?title=Delaunay_triangulation&oldid=951514115
    [55] Jonathan, B., Atman, J. B. (2004). How to find the inertia tensor (or other mass properties) of a 3D solid body represented by a triangle mesh.
    [56] Fogleman, M. Delatin demo. Retrieved July 7, 2020, from https://mapbox.github.io/delatin/
    [57] Fogleman, M. (2019). Heightmap meshing utility. Retrieved July 7, 2020, from https://github.com/fogleman/hmm
    [58] Agafonkin, M. F. V. (2019). A fast JavaScript terrain mesh generation tool based on Delaunay triangulation. Retrieved July 7, 2020, from https://github.com/mapbox/delatin
    [59] Bathe, K. J., Wilson, E. L. (1976). Numerical Methods in Finite Element Analysis: Prentice-Hall.

    QR CODE