Scientific Computing on GPU's

The Graphics Processing Unit or GPU is used more and more for Scientific Computing. For relatively low costs one can obtain supercomputer performance (1 Teraflop). It appears however that some work has to be done to make an ordinary program suitable for use on the GPU. One of the important tools is the development of CUDA (Compute Unified Device Architecture). This is an extension of the C programming language, which can be used to program the GPU in an easy way. Furthermore, in many cases algorithms have to be adapted in order to make them suitable for GPU computing. Finally, optimization of the algorithm, implementation and use of the hierarchy of the GPU is needed to obtain real high speed ups.

Please find below some items which are important for the Numerical Analysis Group of the TU Delft.

GPU Teaching

Delft University of Technology is a recognized NVIDIA GPU Education Center.

We teach the course "Introduction/advanced course Programming on the GPU with CUDA" a number of times. The teachers are C. Vuik, Ir. C.W.J. Lemmens, and Dr. M. Möller,

The next time the course is given on April 8-9, 2024, registration. Please consult the flyer for more details.

GPU Research

The core of our research is how to invent and implement algorithm to solve systems of discretized partial differential equations in an efficient way. Below we give some of the work that has been done and some new Bachelor, Master, and PhD Thesis projects. I am a member of the NIRICT Reconnaissance Topic: Performance and Correctness of GPGPU Applications team.
Bachelor Projects
Master Projects
PhD Projects

GPU Software

This zip file contains software to solve a linear system Ax = b by the Deflated Preconditioned Conjugate Gradient Method on the GPU under certain assumptions. Please save the file and unzip it. In the final directory there are two readme files which have to be read in the following order:


Deflation is also used in the PARALUTION library which enables you to perform various sparse iterative solvers and preconditioners on multi/many-core CPU and GPU devices. The open source variant of this code is available on github.

GPU Hardware

The Little GREEN Machine II

A description of the machine is given here.

Press bulletins

The Little GREEN Machine I

Press bulletins

64-bit Linux clusters

Besides this DIAM also has 2 64-bit Linux clusters, of which one has 8 nodes and the other 16 nodes. Both have state-of-the-art dual or quad core Intel processors with about 16 GByte internal memory for each node. These systems are mainly used for heavy computations that cannot be done on an ordinary desktop. These applications run either standalone or in an MPI based cluster environment.

GPU processors

The most recent (Nov 2010) GPU processor is the "Nvidia Tesla C2070", also known as the Fermi, with 6 Gigabyte internal memory.

Recently, all clusternodes were equipped with so-called GPU processors by Nvidia, which is known to give a performance boost of a factor 20-100 for several mathematical operations not involving recurrence. We also acquired one of the fastest architectures available at the moment (Feb 2010): the "Nvidia Tesla C1060", which will be used in the near future for advanced mathematical computations.

Student lab facility

DIAM also has a student lab facility with 16 simple Linux desktops (now also equipped with a simple GPU). This labroom is used for instructions connected with our math courses, but also to organize courses were we teach our students and new researchers how to use e.g. MPI on the clusters and the GPUs.

Contact information: Kees Vuik

Back to the home page of Kees Vuik