Re: [Bug-gnubg] redesign gnubg to master/slave

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] redesign gnubg to master/slave

From:	Olivier Baur
Subject:	Re: [Bug-gnubg] redesign gnubg to master/slave
Date:	Tue, 23 Dec 2003 00:38:46 +0100

Hi everybody!

I'm finally back from Africa -- back to the gnubg world!

Le 21 déc. 03, à 20:54, Joern Thyssen a écrit :

One of the recent topics has been multi-threading. Instead of

multi-threading Jim suggested splitting gnubg into a GUI (master) andan

engine (slave) which communicates over sockets.

Besides avoiding the problem of making the engine of gnubg thread-safe
it also follows the Unix design principles
(http://www.catb.org/~esr/writings/taoup/html), i.e.,
http://www.catb.org/~esr/writings/taoup/html/ch11s06.html.

As Holger pointed out, I have written a thread-safe version of gnubg;more precisely, a version of gnubg that can compute rollouts inmultiple threads, taking advantage of multi-processor hosts. It canalso communicate with other gnubg's on a TCP/IP network, extending thenotion of multi-threaded "gnubg computing units" to "remote gnubgcomputing units".


The main parts of the work here:

1/ make gnubg thread-safe (at least in its rollout part and GUI part)-- that is , mainly hunting down "bad" behaved globals and other staticvariables that would mix up between different thread contexts;2/ rewrite the rollout function so it now divides its rollout job insmaller 1-game rollout tasks, with are then handled and dispatched by a"task engine", which collects results back to the rollout function forresult accumulation; the bad news is this is several-month old, and Ithink some new features have been added to rollouts, like rolloutinterrupting/resuming.3/ write TCP/IP exchange mechanism (and network slave/masternotification and automatic joining)

4/ write a GTK GUI (and get a running thread-safe GTK version of gnubg

For those who will have a look at cvs branch-multi, I have set up myimplementation notes at the following address (plain ascii text file):http://mapage.noos.fr/gnubgosx/processingunits.txt (different than theuser-oriented HTML version that Holger already mentioned:http://mapage.noos.fr/gnubgosx/procunits.html)It describes more technically the concepts I used, how I made gnubgthread-safe, and how I implemented multi-threading and task handling.



Le 22 déc. 03, à 19:42, Jim Segrave a écrit :

A single instance of gnubg should be capable of doing all of play,
analyse and rollout without having to have any external
processes. This is conceptually easiest for users - there's one
program and it does everything they expect.


I also agree with this

There should be a stripped down version of gnubg, call it
gnubg-analyser, which should be able to do the following:

1) Given a position, a set of dice rolls and the evaluation rules,
   select the best move, if any exists, for each roll - sort of a
   FindnSaveBestMove() over a more limited set.

Or

2) given a position, a rollout setting, and a trial number, rollout a
   single game and return the results - more or less
   BasicCubefulRollout()

Or

3) given a position, a dice roll and a move or just a cube decision,
   and an analysis context, do an AnalyseMove()

This is what I called "slave processing units" ; processing units arethreads of gnubg (typically, one per processor, on one or severalhosts) ; and there's the master gnubg thread, that drives the GUI (GTK)and uses the available slave processing units, either local -- runningon the host, one per proc --, or remote -- running on remote hosts, oneper processor on remote host --) . The version of gnubg I wrote startsup in master mode (the usual gnubg), but can be switched to "slave"mode, ready to join a master. This can be done either in TTY or GTK.

Please note there are also some non-trivial data to transmit along,like random number generator info (like seed roots or RNG state) --which game will be rolled out with wich dice is of utmost importance tovariance reduction.

gnubg itself should allow the user to identify client gnubg-analysers
which it can use to help with evaluations. These would be
interconnected with TCP/IP sockets. They could be on localhost or on
remote machines. At startup, gnubg would attempt to open connections
to the gnubg-analysers so that it would know how many were available
at any given time. An analyser would only server a single gnubg, once
it accepts a connection from that gnubg, any other gnubg will be
unable to connect. Users with multiple machines who wanted to have two
gnubg's going and 4 analysers would have to choose which analysers
would be available to which machines.


I've also written this.

Slaves can notifiy their presence either to all available master hostson the local TCP/IP network, or to a single, user-chosen master. When amaster spots a self-notifying slave, it automatically adds it asco-operating gnubg to its list of processing units.A master can also be set to connect to an arbitrary list of gnubgslaves, for exemple by feeding it with a startup file containing gnubg"pu" instructions (type "help pu" , "help get pu", "help set pu" ingnubg to get the details, or some older mail I posted about it on thelist some months ago; I need to find the references back)

During startup, the analysers should inform gnubg about their
approximate computing power, this allows the master to make some
guesses about load balancing between machines. Say someone has a
couple of old P-300s on a home network and a 2.4G main machine. You
would not want to try to split the work 33%-33%-33%, you'd probably
want to do something like 10% - 10% - 80% when doing analysis or
rollouts.

I didn't go to these ends, but for now you can define, on the master,different queue sizes for each remote slave; the queue size basicallyrelates to the number of tasks that are sent at the same time to aremote host; so if you're using 2 remote hosts, one being twice as fastas the other one, you would set the queue size of the faster to 2 andthe slower to 1, or maybe 8 and 4 if they happen to compute so fastthat network overhead becomes too important as compared to realcomputations) ; basically, it's hard to precompute how you are going tosplit the job: maybe your 10%-10%-80% ratio is not correct; maybe onehost will crash and you'll have to split the remaining; maybe some newhosts might join in the middle of a long rollout; I think the bestpolicy is "serve a soon as finished": feed a slave as soon as it hasreturned the results of all the tasks it was given; work on the fly.The "ratio" issue only pertains when there are only a few tasksremaining to compute in the whole job, but then it gets a very hardwork to make a good job splitting decision: who do you give theseremaining tasks to? do you give them all to process A who is ready tocompute? or do you keep some for process B, which runs very fast, and"should" be ready soon? I chose the simple solution (feed slave as soonas available, with some refinment in the "end of job near" case), withsome "simple" queue parameter (which the user can adjust according toremote host processing speed --increase queue size if host is fast--and network overhead --don't use queue size so small that networkcommunication gets noticeable--)

The connections should have keep-alives as part of their protocol, so
that the loss of one of the gnubg-analysers would be noticed before
the main gnubg has waited too long for answers that won't be
forthcoming.


I still have to add better "timeout" handling.

For now, gnubg masters can safely handle slave disappearing from thenetwork (network problem or slave crash/killed/returned to master mode)

I can see a few different modes of working:

During ordinary play, if gnubg is set to some fairly demanding
standard - I often play against gnubg on 2 ply supremo settings, then
individual moves could be shared among analysers. I'm not sure how
cube decisions would be partitioned, but given a dice roll, I'd think
you could do something like:

gnubg builds a list of legal moves. It then divides that list up and
lets the gnubg-analysers pick the best moves out of a subset of the
legal ones, while gnubg does some for itself. When it completes it's
selection, it then waits for the others to complete, gathers the
results and makes the final decision.
 For 0 ply (and quite possibly 1 play), there's
probably no point in using any remote analysis. For 2 ply you assume
that each legal move will take the same amount of time to analyse, so
in the situation postulated before of 2 slow analysers, given 12 legal
moves, gnubg would do 10 and pass 1 to each to the other machines.

I didn't write anything for n-ply evaluation yet (only rollouts, seebelow)I agree with you that it's no use doing anything with 0-ply and 1-plyevals.I don't think it's a good idea to split the work before starting tocompute. Rather, divide the work (eg, quickly said, divide a 2-ply evalinto, say, 18 1-ply evals if there are 18 legal moves, and have these18 tasks given to available threads/processes (processing units in myterminlogy) -- you can't know beforehand the time that n-plyevaluations will take (it depends on number of available moves indeeper plies), so it's hard to fairly dispatch the tasks. One mainthread should do the "job management" (this I called the "task engine"in my implementation, run by the master thread: send tasks, collectresults), and one more "computing thread" per processor crunchesnumbers; plus processes running on remote hosts and communicating withmaster thread (actually with other threads that run locally on themaster host and handle the individual TCP connections and exchangeprotocol)

The main disadvantage to this is
that caching is largely defeated, since the forward evaluations of the
selected move are likely to have been made on a remote machine. Here
gnubg must decide how much work to hand to each analyser and how much
to keep for itself.

I'm not sure cache impact would be so great; I think that cache can bemostly useful as you "travel down" the tree of possibilities (plies),since positions might reappear on deeper layers of the tree; I'd thinkthere is less to share among different (parallel) parts of the treewhen the first node is already different. I'm not sure if I what I meanis clear...

A different option for play would be to let gnubg handle all aspects
of play. But it could run a background task, particularly when it's
waiting for user input, where it passes any unanalysed moves to
gnubg-analysers and installes their results in the moverecords. If a
user then goes back in the game list and changes a play, this is no
major issue. If one of the moves which has now been undone, as it
were, has been passed to a gnubg-analyser, then when the answer is
returned, gnubg checks and finds that the moverecord is now gone and
simply discards the answer. The idea is that gnubg will have a match
analysed not long after it has been completed - it may even be
possible for the user to see analysis of earlier moves while the game
is still in progress.

Yes, this would be a particularily nice feature to implemented, whichwould benefit all users, not only those who have multi-processormachines or several computers available on a TCP/IP network.

When anlaysing a match or game, the same procedure as the above
paragraph would be followed - gnubg would build the moverecords, then
begin handing them out to the remote analysers. The only difference
would be that the user would have given the analyse game/session/match
command, so gnubg itself would know that it could also do some of the
analysis itself - it would include itself in the list of available
analysers.


Yes, agreed.

As soon as an analyser completes a move, it is given the next
unprocessed one. In this case there's no need to guess at which
machine is the most powerful, they will each work to full capacity as
long as there are uncompleted moves.

Agreed too. This meets the "feed slave as soon as ready" strategy Ipropose for position evals and that I implemented in rollouts.

But I think cache will be important here; that is, if you have alreadyanalysed play N in the match with one processing unit, then you'dbetter give it play N+1 to analyse, because there surely will be plentyof useful data in the cache! I think we'll need some task dispatchalgorithm that strive to give processing units as many consequent playsto evaluate as possible, without precomputing the job splitting.Interesting problem... :-)

Rollouts are the most interesting problem, because we want to be able
to interrupt and resume rollouts. Gnubg begins by deciding it will do
the first trial, then passing the next n trials to the
gnubg-analysers. During it's rollout, it checks for analysers having
completed. If they have, it stores the results of that one rollout and
issues the next trial to the analyser. When gnubg completes a rollout,
it then takes the results of that rollout and any following ones in
the series (not including gaps) and builds the cumulative result.


Agreed, this is also my "feed as soon as ready" philosophy.

here's a picture or a rollout where gnubg is faster than analyser 1
and much faster than analyser 2. Trial 1 would be the first rollout a
single gnubg would do, trial 1 would be the second, etc. So, for a
rollout of a single move, these would be the same as the games rolled
out. When rolling out two moves, or a cube decisions, trial 1 is move
1, game 1, trial 2 is move 2, game 1, trial 3 is move 1, game 2, etc.



        Time
      0         1         2         3         4         5         6

0.........0.........0.........0.........0.........0.........0.....

gnubg | trial 1  |  trial 4  | trial 7  | trial 9  | trial 11 |

anal1 | trial 2      | trial 5       | trial 8       | trial 12      |

anal2 | trial 3          | trial 6          | trial 10         |


Time    action
0       initiate the rollouts
11      gnubg completes trial 1. It can put this into the cumulative
        results and begin trial 4
15      anal1 completes a result. This is stored and anal1 begins
        trial 5

19 anal2 completes trial 3. This is stored and anal2 begins trial6

23      gnubg completes trial 4. It can put trial 2, 3, and 4 into the
        cumulative results and begin trial 7

31 anal1 completes trial 5. This is stored and anal1 begins trial8

34      gnubg complets trial 7. Trial 5 can be added to the cumulative
        results, but trial 7 must be stored. gnubg begins trial 9

38 anal2 completes trial 6, this is stored and anal2 begins trial10

45      gnubg completes trial 9. Trial 6 and 7 are added to the
        cumulative results, gnubg stores trial 9 and begins trial 11

47 anal1 completes trial 8. This is stored and anal1 begins trial1256 gnubg completes trial 11. Trials 8 and 9 are added to thecumulative

        results.

...
If the rollout were interrupted between 45 and 47, then we'd only have
usable results for trials 1..7, if it were between 48 and 56, then
we'd have trials 8 and 9 as well. In general, this would keep all the
processes working more or less flat out (it should even be possible to
add or subtract servers during a rollout).

That's what my "task engine" is responsible for: assigning tasks (andreassigning failed tasks ) and collecting results.

The potential for people with a home network or friends who are
willing to lend their machines for overnight rollouts or whatever

Right. This I implemented too: slaves joining or quitting a longrollout without disturbing the whole job -- it basically comes freewith the "serve as soon as..." strategy :-)One thing: this can also be done on the Internet (why limit to a homenetwork?)One can even imagine "gnubg slave farms" to compute big rollouts ormaybe help in training new neural nets (I don't know if trainingalgorithms can be easily split down into subtasks to dispatch amongcooperating slaves, like the SETI screensaver)

Other thoughts:

To deal with firewall issues, it should be possible to specify the
ports to be used for both outbound connections from gnubg and inbound
to the analyser.
It would be a good idea to link to libwrap

The analyser should be allowed to work either from stdin/stdout (so it
can be invoked from inetd or with a command line to listen on a
specific port.

All the exchanges should be done in ASCII, numerical values can be
exchanged in %.18g or similar format.

It's probably easiest, though not the most efficient, to exchange the
full analysis or rollout contexts with each service request and to
pass them back again as part of the results.

Agreed to. I've also implemented the above, to some degree. Details canbe found in the txt document I pointed at the beginning of this mail.



Bottom line

I'm not sure it is necessary to go through a full rewrite of gnubg inmaster/slave at the GUI / engine level, as I've already done it atmaster thread / slave thread level, with only the master thread havingaccess to GUI, and a single gnubg process being able to be switchedto/from master/slave state.Please have a look at branch-multi, and try and compile it on yourplatforms. I know that it works on Macs (multi-threading and remoteprocessing), on PC's (multithreading only, still got problems withwinsock for remote processing), and I think it should work with mostPOSIX-compliant flavours of UNIX/Linux.



Best

-- Olivier

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-gnubg] redesign gnubg to master/slave, Joern Thyssen, 2003/12/21
- Re: [Bug-gnubg] redesign gnubg to master/slave, Holger, 2003/12/22
  - Re: [Bug-gnubg] redesign gnubg to master/slave, Jim Segrave, 2003/12/22
    - Re: [Bug-gnubg] redesign gnubg to master/slave, Holger, 2003/12/22
- Re: [Bug-gnubg] redesign gnubg to master/slave, Olivier Baur <=
  - Re: [Bug-gnubg] redesign gnubg to master/slave, Jim Segrave, 2003/12/23

Prev by Date: Re: [Bug-gnubg] redesign gnubg to master/slave
Next by Date: [Bug-gnubg] friday entertainment
Previous by thread: Re: [Bug-gnubg] redesign gnubg to master/slave
Next by thread: Re: [Bug-gnubg] redesign gnubg to master/slave
Index(es):
- Date
- Thread