[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Measuring performance levels
From: |
Douglas Zare |
Subject: |
[Bug-gnubg] Measuring performance levels |
Date: |
Wed, 23 Oct 2002 02:36:52 -0400 |
User-agent: |
Internet Messaging Program (IMP) 3.1 |
In addition to measuring the correctness of absolute evaluations or correctness
of individual decisions, I think it would be nice to measure the ability to
execute a game plan. This is hard to measure objectively in many important
situations, but not positions of one-sided errors. I described this in more
detail in example 3 of my latest (October 22nd) column in GammonVillage.
Here is the position (rolled out 10 times with different settings in the
column).
--------------------------------------------------------------------
| zare (X) vs. Snowie (O) |
--------------------------------------------------------------------
Money session. Score X-O: 0-0
X on roll, cube action
+24-23-22-21-20-19-------18-17-16-15-14-13-+
| O O O X | | X |
| O O O X | | |
| O O O | | | S
| O O | | | n
| 6 O | | | o
| |BAR| | w
| | | | i
| | | | e
| X X X | | X |
| X X X | | X |
| O X X X | | X |
+-1--2--3--4--5--6--------7--8--9-10-11-12-+
Pipcount X: 119 O: 47 X-O: 0-0/Money (1)
CubeValue: 1
Rollout Money equity: 0.505
0.1% 3.6% 77.0% 23.0% 7.2% 0.0%
95% confidence interval:
- money cubeless eq.: 0.505 ±0.013.
Rollout settings:
Full rollout,
21600 games (equiv. 24650 games),
played 1-ply,
seed 11, with race database.
1. Double, take 0.846
2. No double 0.721 (-0.126)
3. Double, pass 1.000 (+0.154)
Proper cube action: Double, take
------------------------------ End ----------------------------------
Since O can very rarely make an error in cubeless money play, the result of the
rollout is a good indication of the strength of the bot. A higher equity for X
means that the bot plays this position better. The rollouts indicated that for
this position, Jellyfish Level 6 plays worse than Snowie 3 1-ply, and that
Snowie 4 2-ply (medium) played worse than Snowie 3 3-ply. So, how does gnubg
fare on different settings? (It is important that the rollout be cubeless and
untruncated, with checker play according to the usual cubeless money gammon
price.)
Douglas Zare
- [Bug-gnubg] Measuring performance levels,
Douglas Zare <=
- Re: [Bug-gnubg] Measuring performance levels, Joseph Heled, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joern Thyssen, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- RE: [Bug-gnubg] Measuring performance levels, Albert Silver, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joern Thyssen, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joseph Heled, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/24