NVIDIA CUDA on Ubuntu Karmic Koala and GCC 4.4

NVIDIA’s currently released version of CUDA does not directly with Ubuntu 9.10 which uses GCC 4.4 as the default compiler. However, it is quite easy to make it work and the following guide goes through the entire installation process in some detail.

I use the 64bit version of CUDA, go here to get it:
http://www.nvidia.com/object/cuda_get.html
(I selected Linux 64 bit and  Ubuntu 9.04 .)

This lets you download the CUDA Drivers, the CUDA Toolkit and the CUDA SDK, and you will end up with three files:

cudadriver_2.3_linux_64_190.18.run
cudatoolkit_2.3_linux_64_ubuntu9.04.run
cudasdk_2.3_linux.run

Installing the NVIDIA driver:

Cuda needs at least version 190 of the linux driver

1.

Uninstall existing NVIDIA drivers and nvidia-glx. (Use gui or aptitude).

2.

Install driver:

Quit x (log out of your desktop, go to a virtual terminal (CTRL+ALT+F1), login, run

$ sudo service gdm stop

3.

Run

$ sudo cudadriver_2.3_linux_64_190.18.run

to install the driver, follow the prompts.
Say yes to install 32 bit compatibility libraries.
If you want you can let the installer overwrite the Xorg configuration file.

4.

Reboot and log back in.

5.

Run

 $ nvidia-settings

to verify that your driver version is at least 190.18. Look for the driver version in the window:

The 190.xx NVIDIA Driver for use with CUDA.

Installing the CUDA Toolkit:

After having installed the driver we now need to install the CUDA toolkit itself.

1.

Run:

$ sudo sh cudatoolkit_2.3_linux_64_ubuntu9.04.run

2.

Press enter to install at the

/usr/local/cuda

default location.

3.

Register the new library files, run:

$ sudo gedit /etc/ld.so.conf.d/cuda.conf

and add

/usr/local/cuda/lib64

to the otherwise empty file and save it. Then run:

$ sudo ldconfig

You can also add

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

to the end of your ~/.bashrc file.

(You should then login and out again of start a new bash with

bash --

to update your environment.

Installing the CUDA SDK and Compiling the Example Programs

We will now install the CUDA SDK to your own home directory so you can play with the supplied demos:

1.

Run:

$ sh cudasdk_2.3_linux.run

I chose to install it at the default location.

2.

As CUDA does not yet work with GCC 4.4 you will have to  install gcc-4.3:

$ sudo aptitude install gcc-4.3 g++-4.3

3.

Go to SDK source dir:

cd ~/NVIDIA_GPU_Computing_SDK/C$

4.

Create a directory and create symlinks to gcc-4.3/g++-4.3

$ mkdir mygcc
$ cd mygcc
$ ln -s $(which g++-4.3) g++
$ ln -s $(which gcc-4.3) gcc
$ cd ..

5.

Edit common makefile:

$ gedit common/common.mk

Find the lines that specify CC, CXX and LINK and change them to:

CXX        := g++-4.3
CC         := gcc-4.3
LINK       := g++-4.3 -fPIC

Add

#use gcc-4.3
NVCCFLAGS+=--compiler-bindir=${HOME}/NVIDIA_GPU_Computing_SDK/C/mygcc

Before the line that says “ifeq ($(nvcc_warn_verbose),1)”. This tells the nvcc computer that it should look in mygcc for gcc, which will cause it to pickup our gcc-4.3 compiler.

6.

You should now be able to compile everything by running

$ make

This should now compile all the examples in the SDK without errors.

Verify Installation

We can now verify that everything is working:

1.

Run:

$ bin/linux/release/deviceQuery

On my machine I get the following output (depending on your harware, you ourput may be different):

CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "GeForce 8800 Ultra"
CUDA Driver Version:                           2.30
CUDA Runtime Version:                          2.30
CUDA Capability Major revision number:         1
CUDA Capability Minor revision number:         0
Total amount of global memory:                 804585472 bytes
Number of multiprocessors:                     16
Number of cores:                               128
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       16384 bytes
Total number of registers available per block: 8192
Warp size:                                     32
Maximum number of threads per block:           512
Maximum sizes of each dimension of a block:    512 x 512 x 64
Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
Maximum memory pitch:                          262144 bytes
Texture alignment:                             256 bytes
Clock rate:                                    1.51 GHz
Concurrent copy and execution:                 No
Run time limit on kernels:                     Yes
Integrated:                                    No
Support host page-locked memory mapping:       No
Compute mode:                                  Default (multiple host threads can use this device simultaneously)
Test PASSED
Press ENTER to exit...

You should now be good to go. Here’s an screenshot of the volumeRender demo:

$ bin/linux/release/volumeRender
The volumeRender applicatin from the CUDA SDK.

Remember to use gcc-4.3/g++-4.3 and the –compiler-bindir option to nvcc when compiling your own CUDA source.

ISAAC 2009 Accepted Papers

The list of accepted papers for ISAAC 2009 in Hawaii has now been posted.
I’m happy to see that my paper “Counting in the Presence of Memory Faults” is on the list (number 136), it has been co-authored by Gerth Brodal, Allan Jørgensen and Gabriel Moruz. The paper presents counting algorithms designed in the Faulty Memory RAM proposed by Finocchi and Italiano.
A fault-tolerant resilient counting algorithm as defined in our paper is a datastructure with an increment operation and a query operation. The query algorithm returns the number of increments operations that has been performed on the data structure, within an additive error factor.

It is simple to design such a datastructure in the standard RAM using a simple integer variable, but this is not possible in faulty memory RAM where arbritrary memory cells can get corrupted at any point in time. In the paper we present tradeoff lower bounds and upper bounds and tradeoffs between the additive error factor and the time used per operation.

Computing Nationwide Flood Maps

A few months ago I made the following video and uploaded it to youtube (HD version available on youtube):

From the youtube description:

This video shows what parts of Denmark will flood if the ocean rises. The flood maps were computed using the TerraSTREAM software package and the finished flood maps were visualized by importing the resulting rasters into the GRASS open source GIS.

The algorithm used by TerraSTREAM takes dikes and natural features into account and works directly on the original input terrain. The terrain has been down-sampled afterwards for visualization purposes, the few seemingly disconnected flat areas are due to the fact that the computed output has been down-sampled for this video.

The size input grid dem used in this computation was around 100 gigabytes. The video was used in recent article(in Danish) which was about TerraSTREAM’s ability to compute flooding maps like the one shown in the video, on even very big terrains. This type of flood map can be used in an initial screening phase for computing flood risks.
In my opinion TerraSTREAM is a good example of I/O-efficient (external memory) algorithms put to use in efficiently solving real world problems on massive datasets.

 

Update: A similar, interactive, example be seen at our global flood map site.

Live from the Nordic Collegiate Programming Competition 2008

I am currently managing the Aarhus University (and only Danish!) site for the Nordic Collegiate Programming Competition (NCPC 2008).

We have 5 teams here – which is a 250% increase from last year (which was also an increase from the year before that). We are currently two hours into the competition and all the teams here are doing well and have solved at least two problems.  Right now, “MADALGO Men” is in the lead in the “student” class, and only one team in the open class have solved more problems! There is a total of about 140 teams competing.

Take a look at the Live Scoreboard.

Participate in the ACM Collegiate Programming Contest

This is probably most interesting for students at Aarhus University. I’m coaching the teams from Aarhus that’ll go to the North-West European Regional Finals (NWERC 2008) in Utrecht. If you are interested in trying out for the teams you can read more here.

We have some talented people participating this yeah, I think they’ll do well at NWERC :) This will be my fifth NWERC event.  I participated in 2003, 2004 and 2005 and coached in 2006. I was at Duke during NWERC 2007 so I couldn’t go to that year.

Remember. programming is fun!