Tuesday, March 15, 2011

Getting CUDA up and running on RedHat EE (CentOS)

Let's get right into it! Start by grabbing the latest CUDA toolkit (3.2.16 at the time of writing) from:


http://developer.nvidia.com/object/cuda_3_2_downloads.html#Linux

under "CUDA Toolkit for RedHat Enterprise Linux 5.5" (make sure to pick the architecture that matches your machine!)

Also grab the latest SDK, labeled "GPU Computing SDK code samples".

Now, open a shell and cd into your Downloads directory, and run the CUDA toolkit installation like so:

> chmod u+x cudatoolkit_3.2.16_linux_64_rhel5.5.run
> ./cudatoolkit_3.2.16_linux_64_rhel5.5.run

Make sure to specify an installation directory that makes sense, either the default or e.g. "~/local" if you're doing a test install.

Same goes for the SDK:

> chmod u+x gpucomputingsdk_3.2.16_linux.run
> ./gpucomputingsdk_3.2.16_linux.run

Again, specify where to put the SDK stuff. I thought ~/local/cuda/sdk was nice but it may have its drawbacks as you update your CUDA toolkit but want to keep your SDK the same.  Also make sure to point the SDK installer to the directory just specified for the CUDA toolkit installation (e.g. /usr/local/cuda or ~/local/cuda) when prompted.

Now, the first thing you want to build is "deviceQuery", which will tell you if your device driver matches your toolkit. It's really important that these two match, as most of the simple "hello world" stuff will otherwise just quietly not execute any of the code on the device, only the host code! So, starting from your SDK root installation directory, e.g. ~/local/cuda/sdk

> cd shared
> make

This builds a shared library needed by most of the SDK tools. Next,

> cd ../C/src/deviceQuery
> make

Now deviceQuery should be ready to use.

> cd ../../bin/linux/release
> ./deviceQuery

If your driver is up to date this should output something like:

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "Quadro FX 3800"
  CUDA Driver Version:                           3.20
  CUDA Runtime Version:                          3.20
  CUDA Capability Major/Minor version number:    1.3
  Total amount of global memory:                 1073020928 bytes
  Multiprocessors x Cores/MP = Cores:            24 (MP) x 8 (Cores/MP) = 192 (Cores)
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.20 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     Yes
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)
  Concurrent kernel execution:                   No
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 1, Device = Quadro FX 3800


PASSED

Press to Quit...
-----------------------------------------------------------


Otherwise you may have to install an updated device driver. These can be found at:

http://www.nvidia.com/Download/index.aspx

NB! You need root permissions to install updated drivers, and if you're not comfortable dealing with things that may potentially break your X graphics settings you may want to consult a systems expert.

That being said, the installer asks very few questions and generally seems very good-behaved. You will however need to kill your X windows first of all, which will obviously close this browser window along with every other piece of X interface!


Closing X can be done in a myriad ways,  one of which is:

> init 3

Once you're in the shell, cd to your Downloads directory and (as usual):

> chmod u+x NVIDIA-Linux-x86_64-260.19.44.run
> sudo ./NVIDIA-Linux-x86_64-260.19.44.run

Once your device upgrade has finished, start x again:

> startx

and you should now be able to build and run CUDA SDK samples as well as tutorials such as these:

http://llpanorama.wordpress.com/2008/05/21/my-first-cuda-program/
http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/

(note that the latter requires you to build 'cutil' which can be found in sdk/C/common)

Good luck!