bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Measuring performance levels


From: Douglas Zare
Subject: [Bug-gnubg] Measuring performance levels
Date: Wed, 23 Oct 2002 02:36:52 -0400
User-agent: Internet Messaging Program (IMP) 3.1

In addition to measuring the correctness of absolute evaluations or correctness 
of individual decisions, I think it would be nice to measure the ability to 
execute a game plan. This is hard to measure objectively in many important 
situations, but not positions of one-sided errors. I described this in more 
detail in example 3 of my latest (October 22nd) column in GammonVillage. 

Here is the position (rolled out 10 times with different settings in the 
column).

--------------------------------------------------------------------
|                     zare (X) vs. Snowie (O)                      |
--------------------------------------------------------------------

 Money session. Score X-O: 0-0

           X on roll, cube action
           +24-23-22-21-20-19-------18-17-16-15-14-13-+
           | O  O  O     X    |   |                 X |
           | O  O  O     X    |   |                   |
           | O  O  O          |   |                   | S
           | O  O             |   |                   | n
           | 6  O             |   |                   | o
           |                  |BAR|                   | w
           |                  |   |                   | i
           |                  |   |                   | e
           |          X  X  X |   |  X                |
           |          X  X  X |   |  X                |
           |       O  X  X  X |   |  X                |
           +-1--2--3--4--5--6--------7--8--9-10-11-12-+
           Pipcount  X: 119  O:  47  X-O: 0-0/Money (1)
           CubeValue:  1

          Rollout      Money equity: 0.505
               0.1%   3.6%  77.0%    23.0%   7.2%   0.0%
               95% confidence interval:
                  - money cubeless eq.: 0.505 ±0.013.
               Rollout settings:
                  Full rollout,
                  21600 games (equiv. 24650 games),
                  played 1-ply,
                  seed 11, with race database.
                1.  Double, take      0.846  
                2.  No double         0.721  (-0.126)
                3.  Double, pass      1.000  (+0.154)
          Proper cube action: Double, take     

 ------------------------------ End ----------------------------------

Since O can very rarely make an error in cubeless money play, the result of the 
rollout is a good indication of the strength of the bot. A higher equity for X 
means that the bot plays this position better. The rollouts indicated that for 
this position, Jellyfish Level 6 plays worse than Snowie 3 1-ply, and that 
Snowie 4 2-ply (medium) played worse than Snowie 3 3-ply. So, how does gnubg 
fare on different settings? (It is important that the rollout be cubeless and 
untruncated, with checker play according to the usual cubeless money gammon 
price.)

Douglas Zare







reply via email to

[Prev in Thread] Current Thread [Next in Thread]