bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Measuring performance levels 2


From: Douglas Zare
Subject: Re: [Bug-gnubg] Measuring performance levels 2
Date: Mon, 28 Oct 2002 02:06:28 -0500
User-agent: Internet Messaging Program (IMP) 3.1

Quoting Joern Thyssen <address@hidden>:

> On Sat, Oct 26, 2002 at 07:17:11PM +0000, Joern Thyssen wrote
> > On Sat, Oct 26, 2002 at 06:05:13AM -0400, Douglas Zare wrote
> > > 
> > > > How about 1 closed out, extras on the 23, 10, and 7, versus 13 off with
> a
> > > > blot on the 1? 
> > 
> > Just to make sure:
> > 
> >  GNU Backgammon  Position ID: AQAA2rYtAoAAAA
> >                  Match ID   : cAkAAAAAAAAA
> >  +13-14-15-16-17-18------19-20-21-22-23-24-+     O: gnubg
> >  |                  | O |             X  O | OOO 0 points
> >  |                  |   |                  | OOO
> >  |                  |   |                  | OOO
> >  |                  |   |                  | OO
> >  |                  |   |                  | OO
> > v|                  |BAR|                  |     (Cube: 1)
> >  |                  |   |                  |
> >  |                  |   |                  |
> >  |                  |   |                  |
> >  |                  |   | X  X  X  X  X  X |     On roll
> >  |       X        X |   | X  X  X  X  X  X |     0 points
> >  +12-11-10--9--8--7-------6--5--4--3--2--1-+     X: jth
> > 
> > > 
> > > In 90,000 trials, Snowie 3 1-ply wins 15.5% (-0.691 +-0.004). 
> > 
> gnubg 0-ply 93312 trials:
> 19.1% (+- 0.0005) or equity -0.6184 +- 0.001
> 
> gnubg 1-ply (16 cand, 0.32 tol.) 9076 trials:
> 19.11% (+- 0.00015) or equity -0.6183 +- 0.003

Snowie 3 3-ply Huge 33% 4500 trials:
19.8% (+- 0.3%). -0.603+-0.006 confidence interval.

I'm surprised by how well gnu's variance reduction works. If those figures are 
accurate, it means that the luck estimate is off by about 1.5% for a single 
trial! Snowie's is off by 10% (sqrt(4500)*0.3%/2). Perhaps it is because gnu 
understands the absolute evaluations much better on 0-ply. Do the 0-ply 
evaluations match the 0-ply rollouts to within a percent?

I look forward to seeing gnu 2-ply rollout data, to see if gnu 2-ply plays this 
one better than Snowie 3. It's impressive that gnu 0-ply does better than 
Snowie 3 2-ply here, and almost as well as Snowie 3 3-ply, but it is strange 
that 1-ply plays at the same level.

Douglas Zare





reply via email to

[Prev in Thread] Current Thread [Next in Thread]