[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] Measuring performance levels
From: |
Joseph Heled |
Subject: |
Re: [Bug-gnubg] Measuring performance levels |
Date: |
Wed, 23 Oct 2002 22:42:28 +1300 |
Douglas Zare wrote:
>
> In addition to measuring the correctness of absolute evaluations or
> correctness
> of individual decisions, I think it would be nice to measure the ability to
> execute a game plan. This is hard to measure objectively in many important
> situations, but not positions of one-sided errors. I described this in more
> detail in example 3 of my latest (October 22nd) column in GammonVillage.
>
> Here is the position (rolled out 10 times with different settings in the
> column).
>
> --------------------------------------------------------------------
> | zare (X) vs. Snowie (O) |
> --------------------------------------------------------------------
>
> Money session. Score X-O: 0-0
>
> X on roll, cube action
> +24-23-22-21-20-19-------18-17-16-15-14-13-+
> | O O O X | | X |
> | O O O X | | |
> | O O O | | | S
> | O O | | | n
> | 6 O | | | o
> | |BAR| | w
> | | | | i
> | | | | e
> | X X X | | X |
> | X X X | | X |
> | O X X X | | X |
> +-1--2--3--4--5--6--------7--8--9-10-11-12-+
> Pipcount X: 119 O: 47 X-O: 0-0/Money (1)
> CubeValue: 1
>
> Rollout Money equity: 0.505
> 0.1% 3.6% 77.0% 23.0% 7.2% 0.0%
> 95% confidence interval:
> - money cubeless eq.: 0.505 ±0.013.
> Rollout settings:
> Full rollout,
> 21600 games (equiv. 24650 games),
> played 1-ply,
> seed 11, with race database.
> 1. Double, take 0.846
> 2. No double 0.721 (-0.126)
> 3. Double, pass 1.000 (+0.154)
> Proper cube action: Double, take
>
> ------------------------------ End ----------------------------------
>
> Since O can very rarely make an error in cubeless money play, the result of
> the
> rollout is a good indication of the strength of the bot. A higher equity for X
> means that the bot plays this position better. The rollouts indicated that for
> this position, Jellyfish Level 6 plays worse than Snowie 3 1-ply, and that
> Snowie 4 2-ply (medium) played worse than Snowie 3 3-ply. So, how does gnubg
> fare on different settings? (It is important that the rollout be cubeless and
> untruncated, with checker play according to the usual cubeless money gammon
> price.)
>
> Douglas Zare
>
> _______________________________________________
> Bug-gnubg mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-gnubg
at 0ply, 12960 games I got
0.07% 3.6% 77.06% 23% 7.08% 0.0%
So this is the same as the above (SN 3?). I leave higher plies to someone with a
stronger machine.
-Joseph
- [Bug-gnubg] Measuring performance levels, Douglas Zare, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels,
Joseph Heled <=
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joern Thyssen, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- RE: [Bug-gnubg] Measuring performance levels, Albert Silver, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joern Thyssen, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Joseph Heled, 2002/10/23
- Re: [Bug-gnubg] Measuring performance levels, Morten Wang, 2002/10/24
- Re: [Bug-gnubg] Measuring performance levels, Douglas Zare, 2002/10/24