Abstract

This paper presents an efficient strategy to perform the assembly stage of finite element analysis (FEA) on general purpose graphics processing units (GPUs). This strategy involves dividing the assembly task using symbolic and numeric kernels, and thereby reducing the complexity of the standard single-kernel assembly approach. Two sparse storage formats based on the proposed strategy are also developed by modifying the existing sparse storage formats with the intention of removing the degrees-of-freedom-based redundancies in the global matrix. The inherent problem of race condition is resolved through the implementation of coloring and atomics. The proposed strategy is compared with the state-of-the-art GPU-based and central processing unit (CPU)-based assembly techniques. These comparisons reveal a significant number of benefits in terms of reducing storage space requirements and execution time and increasing performance (GFLOPS). Moreover, using the proposed strategy, it is found that the coloring method is more effective compared to the atomics-based method for the existing as well as the modified storage formats.

References

1.
Zienkiewicz
,
O. C.
,
Taylor
,
R. L.
, and
Lee
,
R.
,
1977
,
The Finite Element Method
, Vol.
3
,
McGraw Hill
,
London
.
2.
Ram
,
L.
, and
Sharma
,
D.
,
2017
, “
Evolutionary and GPU Computing for Topology Optimization of Structures
,”
Swarm Evol. Comput.
,
35
, pp.
1
13
.
3.
Ratnakar
,
S. K.
,
Sanfui
,
S.
, and
Sharma
,
D.
,
2020
, “
GPU—Based Topology Optimization Using Matrix-Free Conjugate Gradient Finite Element Solver With Customized Nodal Connectivity Storage
,”
2nd International Conference on Future Learning Aspects of Mechanical Engineering (FLAME—2020)
,
Amity University, Uttar Pradesh, Noida, India
,
Aug. 5–7
, pp.
1
8
.
4.
Ratnakar
,
S. K.
,
Sanfui
,
S.
, and
Sharma
,
D.
,
2020
, “
SIMP-Based Structural Topology Optimization Using Unstructured Mesh on GPU
,”
2nd International Conference on Future Learning Aspects of Mechanical Engineering (FLAME—2020)
,
Amity University, Uttar Pradesh, Noida, India
,
Aug. 5–7
, pp.
1
8
.
5.
Georgescu
,
S.
,
Chow
,
P.
, and
Okuda
,
H.
,
2013
, “
GPU Acceleration for FEM-Based Structural Analysis
,”
Arch. Comput. Methods Eng.
,
20
(
2
), pp.
111
121
.
6.
Cecka
,
C.
,
Lew
,
A. J.
, and
Darve
,
E.
,
2011
, “
Assembly of Finite Element Methods on Graphics Processors
,”
Int. J. Numer. Methods Eng.
,
85
(
5
), pp.
640
669
.
7.
Maciol
,
P.
,
Plaszewski
,
P.
, and
Banaś
,
K.
,
2010
, “
3D Finite Element Numerical Integration on GPUs
,”
Procedia Comput. Sci.
,
1
(
1
), pp.
1093
1100
. ICCS 2010
8.
Lei
,
J.
,
Li
,
D.-l.
,
Zhou
,
Y.-l.
, and
Liu
,
W.
,
2019
, “
Optimization and Acceleration of Flow Simulations for CFD on CPU/GPU Architecture
,”
J. Brazilian Soc. Mech. Sci. Eng.
,
41
(
7
), p.
290
.
9.
Liu
,
W.
,
Schmidt
,
B.
,
Voss
,
G.
, and
Müller-Wittig
,
W.
,
2008
, “
Accelerating Molecular Dynamics Simulations Using Graphics Processing Units With CUDA
,”
Comput. Phys. Commun.
,
179
(
9
), pp.
634
641
.
10.
Komatitsch
,
D.
,
Michéa
,
D.
, and
Erlebacher
,
G.
,
2009
, “
Porting a High-Order Finite-Element Earthquake Modeling Application to NVIDIA Graphics Cards Using CUDA
,”
J. Parallel Distribut. Comput.
,
69
(
5
), pp.
451
460
.
11.
Fu
,
Z.
,
Lewis
,
T. J.
,
Kirby
,
R. M.
, and
Whitaker
,
R. T.
,
2014
, “
Architecting the Finite Element Method Pipeline for the GPU
,”
J. Comput. Appl. Math.
,
257
, pp.
195
211
.
12.
Reguly
,
I. Z.
, and
Giles
,
M. B.
,
2015
, “
Finite Element Algorithms and Data Structures on Graphical Processing Units
,”
Int. J. Parallel Program.
,
43
(
2
), pp.
203
239
.
13.
Banaś
,
K.
,
Kruzel
,
F.
, and
Bielański
,
J.
,
2016
, “
Finite Element Numerical Integration for First Order Approximations on Multi- and Many-Core Architectures
,”
Comput. Methods Appl. Mech. Eng.
,
305
, pp.
827
848
.
14.
Knepley
,
M. G.
, and
Terrel
,
A. R.
,
2011
, “Finite Element Integration on GPUs,” CoRR, abs/1103.0066.
15.
Bolz
,
J.
,
Farmer
,
I.
,
Grinspun
,
E.
, and
Schröoder
,
P.
,
2003
, “
Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid
,”
ACM Trans. Graph.
,
22
(
3
), pp.
917
924
.
16.
Rodríguez-Navarro
,
J.
, and
Susín Sánchez
,
A.
,
2006
, Cesar Mendoza and Isabel Navazo, vol. 1,
The Eurographics Association
, p.
7
.
17.
Dziekonski
,
A.
,
Sypek
,
P.
,
Lamecki
,
A.
, and
Mrozowski
,
M.
,
2012
, “
Finite Element Matrix Generation on a GPU
,”
Prog. Electromag. Res.
,
128
, pp.
249
265
.
18.
Markall
,
G.
,
Slemmer
,
A.
,
Ham
,
D.
,
Kelly
,
P.
,
Cantwell
,
C.
, and
Sherwin
,
S.
,
2013
, “
Finite Element Assembly Strategies on Multi-Core and Many-Core Architectures
,”
Int. J. Numer. Methods Fluids
,
71
(
1
), pp.
80
97
.
19.
Kiss
,
I.
,
Gyimothy
,
S.
,
Badics
,
Z.
, and
Pavo
,
J.
,
2012
, “
Parallel Realization of the Element-by-Element FEM Technique by CUDA
,”
IEEE Trans. Magn.
,
48
(
2
), pp.
507
510
.
20.
Dziekonski
,
A.
,
Sypek
,
P.
,
Lamecki
,
A.
, and
Mrozowski
,
M.
,
2013
, “
Generation of Large Finite-Element Matrices on Multiple Graphics Processors
,”
Int. J. Numer. Methods Eng.
,
94
(
2
), pp.
204
220
.
21.
Carrion
,
R.
,
Mesquita
,
E.
, and
Ansoni
,
J. L.
,
2015
, “
Dynamic Response of a Frame-Foundation-Soil System: A Coupled BEM–FEM Procedure and a GPU Implementation
,”
J. Brazilian Soc. Mech. Sci. Eng.
,
37
(
4
), pp.
1055
1063
.
22.
Dinh
,
Q.
, and
Marechal
,
Y.
,
2016
, “
Toward Real-Time Finite-Element Simulation on GPU
,”
IEEE Trans. Magn.
,
52
(
3
), pp.
1
4
.
23.
Sanfui
,
S.
, and
Sharma
,
D.
,
2017
, “
A Two-Kernel Based Strategy for Performing Assembly in FEA on the Graphics Processing Unit
,”
2017 International Conference on Advances in Mechanical, Industrial, Automation and Management Systems (AMIAMS)
,
MNNIT Allahabad, India
,
Feb. 3–5
, pp.
1
9
.
24.
Zayer
,
R.
,
Steinberger
,
M.
, and
Seidel
,
H.
,
2017
, “
Sparse Matrix Assembly on the GPU Through Multiplication Patterns
,”
2017 IEEE High Performance Extreme Computing Conference (HPEC)
,
Waltham, MA
,
Sept. 12–14
, pp.
1
8
.
25.
Kiran
,
U.
,
Sharma
,
D.
, and
Gautam
,
S. S.
,
2018
, “
GPU-Warp Based Finite Element Matrices Generation and Assembly Using Coloring Method
,”
J. Comput. Des. Eng
,
6
(
4
), pp.
705
718
.
26.
Gribanov
,
I.
,
Taylor
,
R.
, and
Sarracino
,
R.
,
2018
, “
Parallel Implementation of Implicit Finite Element Model With Cohesive Zones and Collision Response Using CUDA
,”
Int. J. Numer. Methods Eng.
,
115
(
7
), pp.
771
790
.
27.
Sanfui
,
S.
, and
Sharma
,
D.
,
2019
, “
Exploiting Symmetry in Elemental Computation and Assembly Stage of GPU-Accelerated FEA
,”
Proceedings at the 10th International Conference on Computational Methods (ICCM2019)
,
Singapore
,
July 9–13
,
G. X. G. R.
Liu
, and
F.
Cui
, eds., ScienTech Publisher, pp.
641
651
.
28.
Kiran
,
U.
,
Gautam
,
S. S.
, and
Sharma
,
D.
,
2020
, “
GPU-Based Matrix-Free Finite Element Solver Exploiting Symmetry of Elemental Matrices
,”
Computing
,
102
(
9
), pp.
1941
1965
.
29.
Sanfui
,
S.
, and
Sharma
,
D.
,
2020
, “
A Three-Stage Graphics Processing Unit-Based Finite Element Analyses Matrix Generation Strategy for Unstructured Meshes
,”
Int. J. Numer. Methods Eng.
,
121
(
17
), pp.
3824
3848
.
30.
Wong
,
J.
,
Kuhl
,
E.
, and
Darve
,
E.
,
2015
, “
A New Sparse Matrix Vector Multiplication Graphics Processing Unit Algorithm Designed for Finite Element Problems
,”
Int. J. Numer. Methods Eng.
,
102
(
12
), pp.
1784
1814
.
31.
Kreutzer
,
M.
,
Hager
,
G.
,
Wellein
,
G.
,
Fehske
,
H.
,
Basermann
,
A.
, and
Bishop
,
A. R.
,
2012
, “
Sparse Matrix-Vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation
,”
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
,
Shanghai, China
,
May 21–25
, pp.
1696
1702
.
32.
Choi
,
J. W.
,
Singh
,
A.
, and
Vuduc
,
R. W.
,
2010
, “
Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs
,”
TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
,
Salt Lake City, UT
,
July
, Vol. 45, ACM, pp.
115
126
.
33.
Ramrez-Gil
,
F. J.
,
Silva
,
E. C. N.
, and
Montealegre-Rubio
,
W.
,
2016
, “
Topology Optimization Design of 3d Electrothermomechanical Actuators by Using GPU as a Co-processor
,”
Comput. Methods Appl. Mech. Eng.
,
302
, pp.
44
69
.
34.
Kirk
,
D. B.
, and
Wen-mei
,
W. H.
,
2016
,
Programming Massively Parallel Processors, Third Edition: A Hands-on Approach
,
Morgan Kaufmann Publishers Inc.
,
San Francisco, CA
.
35.
Karypis
,
G.
, and
Kumar
,
V.
,
1998
, “
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
,”
SIAM J. Sci. Comput.
,
20
(
1
), pp.
359
392
.
You do not currently have access to this content.