commit-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Commit-gnuradio] r9939 - gnuradio/branches/developers/nldudok1/gpgpu-wi


From: nldudok1
Subject: [Commit-gnuradio] r9939 - gnuradio/branches/developers/nldudok1/gpgpu-wip
Date: Wed, 5 Nov 2008 16:49:32 -0700 (MST)

Author: nldudok1
Date: 2008-11-05 16:49:32 -0700 (Wed, 05 Nov 2008)
New Revision: 9939

Added:
   gnuradio/branches/developers/nldudok1/gpgpu-wip/README.cuda
Log:
added README.cuda

Added: gnuradio/branches/developers/nldudok1/gpgpu-wip/README.cuda
===================================================================
--- gnuradio/branches/developers/nldudok1/gpgpu-wip/README.cuda                 
        (rev 0)
+++ gnuradio/branches/developers/nldudok1/gpgpu-wip/README.cuda 2008-11-05 
23:49:32 UTC (rev 9939)
@@ -0,0 +1,85 @@
+GNURadio plus CUDA
+
+Assuming all the default (install) locations of things...
+
+At least on my system, I had to build a nvidia module from scratch which 
required the linux kernel source and for it to be semi-built. Something like 
this did the trick.
+
+sudo apt-get install linux-source
+cd /usr/src
+sudo tar xvjf linux-source-2.6.24.tar.bz2
+sudo ln -s linux-source-2.6.24 linux
+cd linux
+make mrproper && make defconfig && make
+
+Install CUDA, its driver, and the SDK:
+
+sudo bash NVIDIA-Linux-x86-177.73-pkg1.run
+sudo bash NVIDIA_CUDA_Toolkit_2.0_rhel5.1_x86.run
+bash NVIDIA_CUDA_SDK_2.02.0807.1535_linux.run
+
+You might want to put these in your .bashrc too...
+
+export PATH=$PATH:/usr/local/cuda/bin
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib
+
+Build the SDK cruft:
+
+sudo ldconfig
+cd ~/NVIDIA_CUDA_SDK/
+make && make dbg=1
+
+Checkout the CUDA-enabled GNURadio branch and build it:
+
+svn co http://gnuradio.org/svn/gnuradio/branches/developers/nldudok1/gpgpu-wip
+cd gpgpu-wip
+./bootstrap && ./configure --with-cuda=/usr/local/cuda 
--with-cudasdk=/path/to/NVIDIA_CUDA_SDK && make && sudo make install
+cd gr-cuda
+./bootstrap && ./configure --with-cuda=/usr/local/cuda 
--with-cudasdk=/path/to/NVIDIA_CUDA_SDK && make && sudo make install
+
+At the moment CUDA conflicts with gcc-4.2. Use 4.1 or below, 3.4 works fine.
+If you want to use the nvidia debugging and profiling tools for CUDA you 
should 
+use the same version of gcc as they do. For the current CUDA-2.0 code this is 
gcc-3.4
+If you have multiple gcc versions installed you can force the use of gcc-3.4 
during 
+the configure step like this:
+CC=gcc-3.4 CXX=g++3.4 CPP=cpp-3.4 F77=g77-3.4 ./configure 
–with-cuda=/usr/local/cuda –with-cudasdk=/path/to/NVIDIA_CUDA_SDK
+ 
+There are lots and lots of depencies. Good luck with that. 
+If everything works, you should be able to:
+
+cd ../testbed
+time ./gr_benchmark10_test.py -c && time ./gr_benchmark10_test.py
+
+The GPU is slow! Yay overhead.
+
+The overhead could be reduced by doing more work per CUDA call.
+One of the tricks to do this is by changing GR_FIXED_BUFFER_SIZE in 
gnuradio-core/src/lib/runtime/gr_flat_flowgraph.h
+// 32Kbyte buffer size between blocks
+#define GR_FIXED_BUFFER_SIZE (32*(1L<<10))
+
+You could  change this in a large value like:
+//1Mbyte buffer size between blocks (large for CUDA performance)
+#define GR_FIXED_BUFFER_SIZE (1024*(1L<<10))
+
+But then the CUDA blocks run faster but you get a huge latency and the USRP 
driver is not called for long periods of time in the single_threaded scheduler.
+This results in chopped sound.
+
+Some optimum has to be found.
+A better solution would be to run all cuda code in one thread (with large 
buffers) and all other GnuRadio host blocks (especially sources and sinks)
+in another thread.
+
+Still you have to be aware of latency and big memory requirements when using 
large buffers.
+
+Another reason for a lot of overhead is that the CUDA device memory is not 
memory mappable as a circular buffer.
+The current solution is to copy all memory after each operation.
+This is also a real performance killer.
+
+A lot of speed improvement can be found when we would switch to using pinned 
memory.
+When your cuda card has compute capability 1.1 or higher, this would mean that 
host-to-cuda and cuda-to-host copies could execute in the background while the 
host processor is doing other work.
+Some restructuring of the schuduler is also needed for this.
+
+
+Success,
+Martin Dudok van Heel
+
+
+





reply via email to

[Prev in Thread] Current Thread [Next in Thread]