Beta 1


Title Acceleration of a non-linear water wave model using a GPU
Author Madsen, Morten Gorm
Supervisor Engsig-Karup, Allan Peter (Scientific Computing, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark)
Dammann, Bernd (Scientific Computing, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark)
Frisvad, Jeppe Revall (Image Analysis and Computer Graphics, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark)
Institution Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark
Thesis level Master's thesis
Year 2010
Abstract The primary objective of this work is to use a GPU (massively parallel hardware) to accelerate an existing optimized sequential algorithm, solving a potential flow problem. The potential flow problem poses an initial value problem at a 2D surface, coupled with a 3D Laplace problem. A low storage Defect Correction method with a multigrid preconditioner is used to solve a flexible order approximation of the Laplace problem. The widely used explicit RK4 method is applied for time integration. The primary reason for porting this particular solver, is that both Defect Correction and the preconditioner are expected to be well suited for GPUs, given that the right discretization is used. The work focuses on both analysis and implementation of the multigrid method, and understanding how it should be configured in order to be an efficient preconditioner for the Defect Correction algorithm. Only little attention is given to the standard 4 stage Runge Kutta method. The most significant results of the work is that rethinking the memory layout both provides a significant increment in problem size and gives a boost to the solution time, even for a naive CUDA implementation. In particular the program developed can hold a Laplace problem of up to 100,000,000 degrees of freedom in 4GB RAM. For problems of this size, the iterative solution to the Laplace problem is improved by a decimal within a matter of seconds. This is up to 10 times faster than the existing CPU implementation. Although the target platform is the Compute Capability 1.3 Tesla architecture, it is also shown that moving the program to a Fermi architecture GPU, accelerates the code even further with a resulting speedup of up to 42 times faster than the existing CPU code. Remarkably the speedup on the Fermi-architecture is achieved with the naive implementation of the program.
Imprint Technical University of Denmark (DTU) : Kgs. Lyngby, Denmark
Series IMM-M.Sc.-2010-94
Fulltext
Original PDF ep10_94_net.pdf (2.95 MB)
Admin Creation date: 2010-12-20    Update date: 2010-12-20    Source: dtu    ID: 271613    Original MXD