[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss-gnuradio] Re: CUDA GPU Vs CELL BE
From: |
Yu Pan |
Subject: |
[Discuss-gnuradio] Re: CUDA GPU Vs CELL BE |
Date: |
Thu, 16 Jul 2009 04:50:10 +0200 |
>Eric Blossom wrote:
> advantage of it. Again from reading, it appears that you need at
> least 64 elements that you can apply an instruction to, to be in it's
> target zone. For certain parts of our graphs, this is probably OK
> (e.g., FEC decode, FIR's, FFTs), but I'm kind of dubious about
> anything with a depedency chain (IIR's, PLLs, equalizers, etc.)
32 threads in a so called "warp" execute together in a Single
Instruction Multiple Threads (SIMT) manner, on a particular Streaming
Multiprocessor (SM). The control flows among the 32 threads can diverge,
but when that happens, each set of divergence paths will be executed
serially. Your observations are correct. At least for now, CUDA's
strength is still quite restricted to computation intensive data
parallel processing, where nVidia's other 99% business lies in (of
course, the graphics processing). But after GPGPU processing takes off,
things could change.
> I'm also not sure if you can launch multiple kernels simultaneously
> (CUDA-speak). If you could launch multiple kernels, we'd have a
> better chance of using the parallelism.
Currently no. But it is possible execute several parallel tasks within
the same kernel by diverging the control flow, and at the same time,
trying to group each different task (each variance of the control flow)
in groups of 32 threads (considering padding?) to avoid in warp
divergence. Nvcc compiler will at least take care of register allocation
so that multiple tasks won't use more registers than the max a single
one requires.
>
>
> Eric
-Yu
--
Posted via http://www.ruby-forum.com/.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Discuss-gnuradio] Re: CUDA GPU Vs CELL BE,
Yu Pan <=