多圖形處理器之高速離散元素模擬引擎｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃竟鎧 Jing-Kai Huang
論文名稱：	多圖形處理器之高速離散元素模擬引擎 Multi-GPU High-Speed Simulation Engine based on DEM
指導教授：	謝佑明 Yo-Ming Hsieh
口試委員:	陳鴻銘 Hung-Ming Chen 楊元森 Yuan-Sen Yang 王國隆 Kuo-Lung Wang 謝佑明 Yo-Ming Hsieh
學位類別：	碩士 Master
系所名稱：	工程學院 - 營建工程系 Department of Civil and Construction Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	111
中文關鍵詞：	離散元素法、碰撞偵測、GPGPU 、平行運算、多GPU 、雜湊搜尋法
外文關鍵詞：	Discrete Element Method, Contact Detection, GPGPU, Parallel Computing, Multi-GPU, Hashing Search
相關次數：	點閱：339 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

離散元素法（Discrete Element Method, DEM）是一種大地工程領域應用於岩石與砂土材料的微觀力學性質探討，以及山崩土石流模擬等問題的數值分析方法。DEM透過採用精細的時間步驟並計算物理關係，可求得離散元素於不同時間點之狀態，屬於科學計算領域（scientific computing）。然而，採用極短的時間步驟求取高準確度以及離散元素法中重複執行碰撞偵測演算法，造成了計算量大增、運算時間過長之窘境。此外，當採用GPU加速運算時，為了進行更精密或更大規模的模擬任務，離散元素數量需求之上升造成了記憶體不足的問題。
本研究為了解決運算時間過長、記憶體不足之問題，採用以下四種方法進行改善，其中包含：（1）採用動態時間步驟合理調整時間分割以減少循環步驟、（2）使用較高效率之碰撞偵測演算法、（3）採用友善記憶體之平行化雜湊搜尋法進行資料存取、（4）利用多個GPU之平行運算以擴增模擬規模。進行成果分析之後，證明了本研究採用之方法成功的提升了運算效率，並大幅的提升本研究程式可運算之離散元素數量。
除了提升本研究程式整體效能之外，本研究亦進行了程式功能之擴充，以協助實務上的應用。包含：（1）透過球體叢集製作不規則離散元素模型、（2）匯入Tiff格式資料迅速建構地形模型、（3）透過Google提供之Protocol Buffers進行二進制資料格式輸出。

Discrete Element Method（DEM）is a numerical method in scientific computing. It is used in the field of geotechnical engineering to explore the micromechanical properties of rock and soil materials, and to simulate landslide and debris flow. By using fine time-step and calculating the mechanical behavior of discrete elements, the physical state of discrete elements at different time can be obtained. However, using fine time discretization to perform DEM for a high accuracy results in repeated contact detection and thus a significant increase in calculation time. In addition, when using GPU for speed-up, high accuracy and large scale are necessary for high-fidelity simulations. They increase the demand for the number of discrete elements and cause the problem of out-of-memory.
This study uses the following methods to improve our research DEM program to solve the problems aforementioned. First, a dynamic time-step algorithm is adapted to adjust time division reasonably in order to reduce the number of time steps. Second, contact detection algorithms found in physics engine is employed for efficient contact detection. Third, a memory-friendly hashing search algorithm is developed for data access. Finally, multi-GPU computing is developed to expand the scale of simulation. It is shown that these methods have successfully improved the efficiency and greatly increased the number of discrete elements computable by our research program.
In addition to improving the overall performance, this study further added functions necessary for practical applications. First, irregular discrete element is now possible through the clump mechanism. Second, the program can now import TIFF formatted digital surface models to conveniently build terrain models. Finally, the simulation data is now saved using Google’s Protocol Buffers to decrease time for data serialization.

論文摘要    I
ABSTRACT    II
誌謝    III
目錄    IV
圖目錄    VIII
表目錄    XI
第壹章 緒論    1
1 研究動機與目的    1
2 研究流程    2
3 論文架構    3
第貳章 文獻回顧    4
1 物理引擎    4
2 碰撞偵測    5
2.1 Broad-phase    6
2.1.1 Bounding Volume Hierarchy    7
2.1.2 Sweep-And-Prune    8
2.1.3 Spatial Subdivision    9
3 離散元素法模擬軟體    10
4 GPGPU平行運算    13
5 雜湊    15
5.1高效雜湊表    16
第參章 研究方法與工具    18
1 程式開發工具    18
1.1 OpenCL    19
1.2開放原始碼程式庫    19
1.2.1 PhysX 4.0    19
1.2.2 Bullet    20
1.2.3 Boost.Compute    20
1.2.4 Eigen    21
1.2.5 CGAL    21
1.3資料存取工具    22
1.3.1 Protocol Buffers    22
1.3.2 Protocol Buffers與文本格式之比較    24
2 系統驗證方法    25
3 效能評估與剖析方法    28
第肆章 系統分析與實作    29
1程式整體架構與執行流程    29
1.1 程式架構    29
1.2 程式執行流程    31
1.2.1 整體架構流程    31
1.2.2 模擬任務流程    33
2程式架構    34
2.1 離散元素模型    35
2.1.1 球體叢集三維模型製作    35
2.1.2 球體叢集物理參數計算    37
2.2 地形模型    38
2.3 動態時間步驟    39
2.3.1 實作及演算法    39
2.3.2 驗證    39
2.4 碰撞偵測演算法    42
2.4.1 CPU實作    42
2.4.2 GPU實作    42
2.5 碰撞平面搜尋演算法    43
2.5.1 碰撞平面容器之縮減    44
2.5.2 雜湊表之壓縮    45
2.5.3 效能比較    47
2.6 Multi-GPU協作運算    48
2.6.1 Prepare GPU context    48
2.6.2 Initialize GPU engine module    49
2.6.3 Produce engine data    49
2.6.4 模擬任務    50
2.7 輕量化資料格式    51
2.8 小結    52
第伍章 系統驗證    53
1 球體角速度作用於牆    53
2 球體角速度作用於球體所組成之牆    55
3 懸臂梁    55
4 抗壓試驗    57
5 球體叢集於平面之滑動    58
6 球體叢集於空間之旋轉    61
7 球體叢集於斜坡之滾動    63
8 小結    65
第陸章 效能評估與剖析    66
1 效能評估案例與方法    66
2 CPU進行離散元素分析之效能評估    68
3 GPU進行離散元素分析之效能評估    69
3.1 單精度浮點數測試    71
3.2 雙精度浮點數測試    74
3.3 GPU運算效能分析    77
4 GPU效能剖析    79
4.1 Naïve演算法    79
4.2 BVH演算法    80
5整體運算效率比較    82
6小結    84
第柒章 案例展示    85
1 抗壓試驗    85
2 地形及不規則模型模擬    87
第捌章 結論與建議    89
1 結論    89
2 建議與未來展望    90
參考文獻    92

                                

[1] 吳騏宇 (2018)。超高速離散元素分析引擎之研發。碩士論文。國立臺灣科技大學，台北市。
[2] Eberly, D. H. (2010). Game Physics: CRC Press.
[3] Jones, M. T. (2011). Open source physics engines. Retrieved May 8, 2020, from https://www.ibm.com/developerworks/opensource/library/os-physicsengines/
[4] Boeing, A., Bräunl, T. (2007). Evaluation of real-time physics simulation systems. Paper presented at the Proceedings of the 5th international conference on Computer graphics and interactive techniques in Australia and Southeast Asia, Perth, Australia.
[5] Oak Ridge National Laboratory. EARTH SCIENCE. Retrieved July 5, 2020, from https://www.olcf.ornl.gov/leadership-science/earth-science/
[6] 張賀翔 (2012)。應用分離元素法探討順向坡失穩歷程-以國道三號崩塌事件為例。碩士論文。國立臺北科技大學，台北市。
[7] Feldman, M. (2018). Summit Up and Running at Oak Ridge, Claims First Exascale Application. Retrieved July 5, 2020, from https://www.top500.org/news/summit-up-and-running-at-oak-ridge-claims-first-exascale-application/
[8] NVIDIA. (2018). NVIDIA PhysX SDK 4.0 Documentation. Retrieved December 22, 2019, from https://gameworksdocs.nvidia.com/PhysX/4.0/documentation/PhysXGuide/Index.html
[9] Goldsmith, J., Salmon, J. (1987). Automatic Creation of Object Hierarchies for Ray Tracing. IEEE Computer Graphics and Applications, 7(5), 14–20.
[10] Cohen, J. D., Lin, M. C., Manocha, D., Ponamgi, M. (1995). I-COLLIDE: an interactive and exact collision detection system for large-scale environments. Paper presented at the Proceedings of the 1995 symposium on Interactive 3D graphics, Monterey, California, USA.
[11] Karras, T. (2012). Thinking Parallel, Part II: Tree Traversal on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-ii-tree-traversal-gpu/
[12] Karras, T. (2012). Thinking Parallel, Part III: Tree Construction on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-iii-tree-construction-gpu/
[13] Liu, F., Harada, T., Lee, Y., Kim, Y. J. (2010). Real-time collision culling of a million bodies on graphics processing units. ACM Trans. Graph., 29(6), Article 154.
[14] Karras, T. (2012). Thinking Parallel, Part I: Collision Detection on the GPU. Retrieved November 25, 2019, from https://devblogs.nvidia.com/thinking-parallel-part-i-collision-detection-gpu/
[15] Wald, I., Mark, W. R., Günther, J., Boulos, S., Ize, T., Hunt, W., Parker, S. G., Shirley, P. (2009). State of the Art in Ray Tracing Animated Scenes. 28(6), 1691-1722.
[16] Wong, T. H., Leach, G., Zambetta, F. (2014). An adaptive octree grid for GPU-based collision detection of deformable objects. The Visual Computer, 30(6), 729-738.
[17] Nguyen, H. (2007). GPU gems 3: Addison-Wesley Professional.
[18] Kloss, C., Goniva, C., Hager, A., Amberger, S., Pirker, S. (2012). Models, algorithms and validation for opensource DEM and CFD-DEM. Progress in Computational Fluid Dynamics, 12, 140-152.
[19] CFDEM®project. (2013). Open-Source DEM Simulations in an HPC Environment. Retrieved May 9, 2020, from https://www.cfdem.com/system/files/slides_for_dcs_users_group_meeting_no_ci.pdf
[20] Sandia. LAMMPS Molecular Dynamics Simulator. Retrieved July 7, 2020, from https://lammps.sandia.gov/
[21] Plimpton, S. (1995). Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics, 117(1), 1-19.
[22] Sandia. (2020). LAMMPS Documentation. Retrieved May 10, 2020, from https://lammps.sandia.gov/doc/Manual.html
[23] Sandia. (2014). Accelerator benchmarks for CPU, GPU, KNL. Retrieved May 9, 2020, from https://lammps.sandia.gov/bench.html#accelerator
[24] ESSS. Rocky DEM. Retrieved July 7, 2020, from https://rocky.esss.co/
[25] Bharadwaj, R. (2017). Rocky 4 with Multi-GPU: which hardware is best for you? Retrieved February 12, 2020, from https://rocky.esss.co/blog/rocky-4-with-multi-gpu-which-hardware-is-best-for-you/
[26] EDEM. (2016). EDEM 2017 User Guide. Copyright © 2016.
[27] Bruce, C. (2019). 10 common questions about EDEM GPU answered. Retrieved July 15, 2020, from https://www.edemsimulation.com/blog-and-news/blog/10-common-questions-about-edem-gpu-answered/
[28] Kendig, K. (2019). THE NEED FOR SPEED: NEW IFORGE GPUS ENABLE FASTER MODELS, SIMULATIONS. Retrieved July 16, 2020, from http://www.ncsa.illinois.edu/news/story/the_need_for_speed_new_iforge_gpus_enable_faster_models_simulations
[29] Heroux, M. A., Raghavan, P., Simon, H. D. (2006). Parallel Processing for Scientific Computing (Software, Environments and Tools): Society for Industrial and Applied Mathematics.
[30] Barney, B. Introduction to Parallel Computing. Retrieved May 8, 2020, from https://computing.llnl.gov/tutorials/parallel_comp/#Whatis
[31] NVIDIA. (2009). Whitepaper NVIDIA’s Next Generation CUDATM Compute Architecture: FermiTM: NVIDIA Corporation.
[32] Feng, Z., Li, P. (2008). Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms. Paper presented at the Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, San Jose, California.
[33] NVIDIA. (2020). NVIDIA CUDA programming guide. Retrieved May 10, 2020, from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture
[34] Maurer, W. D., Lewis, T. G. (1975). Hash Table Methods. ACM Comput. Surv., 7(1), 5–19.
[35] Garg, P. Basics of Hash Tables. Retrieved May 10, 2020, from https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/
[36] Kulukundis, M. (2017). Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step. Paper presented at the CppCon, Bellevue, Washington. https://www.youtube.com/watch?v=ncHmEUmJZf4&fbclid=IwAR2ABofR5Z2SoaMe6WjUQt-rMP6Z9qBcEZDfcPGPYDlR-KDcbEMh9tVwOX0
[37] Gouy, I. (2020). Which programming language is fastest? Retrieved May 6, 2020, from https://benchmarksgame-team.pages.debian.net/benchmarksgame/
[38] Khronos™_Group. (2008). The Khronos Group Releases OpenCL 1.0 Specification. Retrieved July 7, 2020, from https://web.archive.org/web/20100713014204/http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/
[39] NVIDIA. (2018). PhysX SDK. from https://developer.nvidia.com/physx-sdk
[40] Wikipedia. (6 May 2020 13:11 UTC). Physics processing unit. Retrieved July 7, 2020, from https://en.wikipedia.org/w/index.php?title=Physics_processing_unit&oldid=955193337
[41] NVIDIA. (2018). PhysX SDK 4.0 Available Now. Retrieved July 7, 2020, from https://news.developer.nvidia.com/announcing-physx-sdk-4-0-an-open-source-physics-engine/
[42] Coumans, E. (2013) Bullet physics library. Open source: bulletphysics. org Retrieved July 7, 2020, from https://pybullet.org/wordpress/
[43] Wikipedia. (2020). Bullet (software). Retrieved July 7 2020, from https://en.wikipedia.org/w/index.php?title=Bullet_(software)&oldid=956186268
[44] Google. Bullet Physics SDK. Retrieved July 9, 2020, from https://opensource.google/projects/bullet3
[45] Lutz, K. (2013). Boost.Compute. Retrieved July 10, 2020, from https://www.boost.org/doc/libs/1_65_1/libs/compute/doc/html/index.html
[46] Szuppe, J. (2016). Boost.Compute: A parallel computing library for C++ based on OpenCL. Paper presented at the Proceedings of the 4th International Workshop on OpenCL, Vienna, Austria.
[47] Benoît, J., Guennebaud, G. Eigen v3. Retrieved July 7, 2020, from http://eigen.tuxfamily.org
[48] CGAL. Computational Geometry Algorithms Library. Retrieved July 7, 2020, from https://www.cgal.org/
[49] The CGAL Project. (2020). CGAL User and Reference Manual. CGAL Editorial Board , 5.0.3 edition, 2020.
[50] Google. Protocol Buffers. Google Inc. Retrieved July 10, 2020, from https://developers.google.com/protocol-buffers
[51] Itasca Consulting Group. (2020). Itasca’s Particle Flow Code Documentation: USA: Itasca Consulting Group Inc.
[52] NVIDIA. NVIDIA System Management Interface. Retrieved May 11, 2020, from https://developer.nvidia.com/nvidia-system-management-interface
[53] Taghavi, R. (2011). Automatic clump generation based on mid-surface. Paper presented at the Continuum and Distinct Element Numerical Modeling in Geomechanics-2011, Melbourne, Australia.
[54] Wikipedia. (2020). Delaunay triangulation. Retrieved July 6, 2020, from https://en.wikipedia.org/w/index.php?title=Delaunay_triangulation&oldid=951514115
[55] Jonathan, B., Atman, J. B. (2004). How to find the inertia tensor (or other mass properties) of a 3D solid body represented by a triangle mesh.
[56] Fogleman, M. Delatin demo. Retrieved July 7, 2020, from https://mapbox.github.io/delatin/
[57] Fogleman, M. (2019). Heightmap meshing utility. Retrieved July 7, 2020, from https://github.com/fogleman/hmm
[58] Agafonkin, M. F. V. (2019). A fast JavaScript terrain mesh generation tool based on Delaunay triangulation. Retrieved July 7, 2020, from https://github.com/mapbox/delatin
[59] Bathe, K. J., Wilson, E. L. (1976). Numerical Methods in Finite Element Analysis: Prentice-Hall.

簡易檢索 / 詳目顯示

相關論文