bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Re: Getting gnubg to use all available cores


From: Michael Petch
Subject: Re: [Bug-gnubg] Re: Getting gnubg to use all available cores
Date: Thu, 06 Aug 2009 22:55:09 -0600
User-agent: Microsoft-Entourage/12.20.0.090605

Howdy Louis,

I think that MAX_NUMTHREADS was an artificial limit set by the hardware of
the day. Christian can likely tell you why it is 16 specifically but I am
assuming that it was a someone arbitrary(and reasonable) value based on
cores available on most systems.

Onto your OS/X issue. I did a bit of research and my original view on
waiting for Snow Leopard may actually be all that is required.

Nehalem processors diverge from the previous generation of Intel processors
because they no longer based on SMP (Symmetric MultiProcessor) designs. In
an SMP system, generally all processors have access to main memory (RAM) via
a single data bus. The problem of course is that the more cores you have,
the more contention for memory read/writes that have to occur on that one
bus.

Intel decided that SMP designs likely will not scale properly in the future
when dealing with large core counts (32, 62, 128 cores etc) so they moved
their Nehalem design to NUMA type systems instead of SMP. NUMA is non
uniform memory access. In this type of design cores may not necessarily be
able to share memory with other processors without some help. I'm nto going
to get into the gorey details but the bus system Intel is pushing is the QPI
(QuickPath interconnect) bus. This literally replaces the good old FSB
(Front side Bus)

NUMA architectures do allow for the concept of "Remote" and "Local" data.
Shared data may not be directly available by a processor but it can be
retrieved (remotely) but it will be slower. Operating System Kernels need
NUMA support in order for shared data access on different buses to work
properly.

So your asking, why tell me all this? Well the answer is simple. Apple in
their infinite wisdom started using new QPI/Numa hardware without actually
fully implementing NUMA in its current kernel! This hasn't been well
documented by Apple but it was discovered when companies started running
Xserve on the new QPI/Nehalem systems.

Without proper NUMA support, processors can't arbitrarily share memory with
all other processors. Which seems to be the case here with GnuBG. Gnubg
launches in a single process and then asks the OS/X to create threads (with
shared memory requirements). It appears by default that each processor is
considered as a separate entity without sharing (On OS/X Leopard). The
exception is that eacg core appears as 2 virtual cores. Virtual cores are on
the same processor, thus the same bus so one can share memory across them.

It seems when Gnubg launches, all the threads are created on one processor
(the processor is originally chosen by OS/X) and accessible by 2 virtual
cores (Using Hyperthreading). It seems Apple did this so they could put out
new equipment before the next OS (Snow Leopard) was released.

So what does Snow Leopard have that Leapard doesn't? NUMA support.

My guess is that if you got your hands on Snow Leopard you may find that
what you are seeing changes. Apparently this very problem exists for people
using CS4 (Adobes Creative Studio 4).

Linux supports NUMA, you might be adventuresome and try to install Linux on
your Apple Hardware and see what happens.

Your chess program may work because of the way it splits up tasks (It may
even use a combination of Posix Threads and separate process spaces). I
haven't seen the source code so its very hard to say.

Michael Petch

On 06/08/09 10:29 AM, "Louis Zulli" <address@hidden> wrote:

> Hi,
> 
> I put
> 
> #define MAX_NUMTHREADS 64
> 
> in multithread.h and rebuilt.
> 
> In Settings-->Options-->Other, I put Eval Threads to 64.
> 
> I then let gnubg analyze a game using 4-ply analysis.
> 
> According to my unix top command, gnubg had 69 threads and was using
> 188%CPU. So apparently all the threads were running (into each other!)
> in one physical core.
> 
> In any case, increasing the max number of threads above 16 seems
> trivial to do, unless I'm missing something.
> 
> Louis
> 
> 
> On Aug 6, 2009, at 11:34 AM, Ingo Macherius wrote:
> 
>> Do you use the calibrate command or a batch analysis of matchfiles?
>> The
>> former was shown to be of no value for benchmarks, see here:
>> http://lists.gnu.org/archive/html/bug-gnubg/2009-08/msg00006.html
>> 
>> With calibrate I had the very same effect of high idle times during
>> benchmarks, unless I used at least 8 threads per physical core.
>> 
>> I am doing benchmark on a 4 core machine which iterates over #thread
>> (1..6)
>> and cache size (2^1 .. 2^27). Should be posted in say 3 hours, it
>> literally
>> is still running :)
>> 
>> Ingo
>> 
>>> -----Original Message-----
>>> From: address@hidden
>>> [mailto:address@hidden On
>>> Behalf Of Louis Zulli
>>> Sent: Thursday, August 06, 2009 3:21 PM
>>> To: Michael Petch
>>> Cc: address@hidden
>>> Subject: [Bug-gnubg] Re: Getting gnubg to use all available cores
>>> 
>>> 
>>> 
>>> On Aug 5, 2009, at 4:02 PM, Michael Petch wrote:
>>> 
>>>> I'm unsure how the architecture is deployed and how OS/X
>>> handles the
>>>> physical cores, but it almost sounds like one Physical core is being
>>>> used
>>>> (Using Hyperthreads to run 2 threads simultaneously). I wonder if
>>>> the memory
>>>> is shared across all the cores? A friend of mine was
>>> suggesting that
>>>> people
>>>> may have to wait for Snow Lapard to come out before OS/X properly
>>>> utilizes
>>>> the Nehalem architecture (whetehr that si true or not, I
>>> don't know).
>>>> 
>>>> Anyway, as an experiment. If you run 2 copies of Gnubg at the same
>>>> time
>>>> (using multiple threads) do you get 400% CPU usage?
>>>> 
>>> 
>>> 
>>> Hi Mike,
>>> 
>>> Sorry for the delay. I just had two copies of gnubg analyze the same
>>> game, using 3 ply analysis. Each instance of gnubg used 200%
>>> CPU. Each
>>> copy was set to use 4 evaluation threads.
>>> 
>>> So what's the verdict here? Is Leopard simply not directing threads
>>> correctly?
>>> 
>>> Louis
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bug-gnubg mailing list
>>> address@hidden http://lists.gnu.org/mailman/listinfo/bug-gnubg
>> 
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]