Title 
Acceleration of a nonlinear water wave model using a GPU 
Author

Madsen, Morten Gorm

Supervisor

EngsigKarup, Allan Peter (Scientific Computing, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK2800 Kgs. Lyngby, Denmark) Dammann, Bernd (Scientific Computing, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK2800 Kgs. Lyngby, Denmark) Frisvad, Jeppe Revall (Image Analysis and Computer Graphics, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK2800 Kgs. Lyngby, Denmark)

Institution 
Technical University of Denmark, DTU, DK2800 Kgs. Lyngby, Denmark 
Thesis level 
Master's thesis 
Year 
2010 
Abstract 
The primary objective of this work is to use a GPU (massively parallel hardware)
to accelerate an existing optimized sequential algorithm, solving a potential
flow problem. The potential flow problem poses an initial value problem at
a 2D surface, coupled with a 3D Laplace problem. A low storage Defect Correction
method with a multigrid preconditioner is used to solve a flexible order
approximation of the Laplace problem. The widely used explicit RK4 method
is applied for time integration.
The primary reason for porting this particular solver, is that both Defect Correction
and the preconditioner are expected to be well suited for GPUs, given
that the right discretization is used. The work focuses on both analysis and
implementation of the multigrid method, and understanding how it should be
configured in order to be an efficient preconditioner for the Defect Correction
algorithm. Only little attention is given to the standard 4 stage Runge Kutta
method.
The most significant results of the work is that rethinking the memory layout
both provides a significant increment in problem size and gives a boost to the solution
time, even for a naive CUDA implementation. In particular the program
developed can hold a Laplace problem of up to 100,000,000 degrees of freedom
in 4GB RAM. For problems of this size, the iterative solution to the Laplace
problem is improved by a decimal within a matter of seconds. This is up to
10 times faster than the existing CPU implementation. Although the target
platform is the Compute Capability 1.3 Tesla architecture, it is also shown that
moving the program to a Fermi architecture GPU, accelerates the code even
further with a resulting speedup of up to 42 times faster than the existing CPU
code. Remarkably the speedup on the Fermiarchitecture is achieved with the
naive implementation of the program. 
Imprint 
Technical University of Denmark (DTU) : Kgs. Lyngby, Denmark 
Series 
IMMM.Sc.201094 
Fulltext 

Admin 
Creation date: 20101220
Update date: 20101220
Source: dtu
ID: 271613
Original MXD
