bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] 0ply doubles early


From: Robert-Jan Veldhuizen
Subject: Re: [Bug-gnubg] 0ply doubles early
Date: Sat, 16 Dec 2006 16:49:23 +0100



On 12/15/06, Joseph Heled <address@hidden> wrote:

I think 0-ply cube is awesome and that is due to 0-ply awesome ability
to get the cubeless equities right. You keep saying it would not be
hard to improve by  "fine-tuning the volatility estimate" but I have
not yet seem a sign you checked that the code contains such a knob or
how hard it would be to "fine-tune" it. I don't find those comments
constructive when not based on facts.

I'm not a coder, I'm suggesting ideas and I can assure you they are based on facts or rational arguments. I thought suggestions and ideas for gnubg was one of the goals of this mailinglist. I also thought the idea was to cooperate and help, rather than to tell others that they should do or have done something.

If you're suggesting I'm just coming up with some random stuff and don't know at all what I'm talking about, I find that an offensive comment. My experience with gnubg is quite extensive I'd say and I've been testing its evaluations and rollout capabilities on an almost daily basis for the last four years, including discussions with other gnubg users doing similar testing (mostly on the gammonline forum). Many ideas have come forth from that about certain gnubg weaknesses and peculiarities. Stick's idea that 0-ply cubes too early might be one of them and seems worth looking at as one area where gnubg might be improved. I tihnk that is constructive.

I'd also like to point out that I'm not criticising any of the gnubg developers or the program itself, quite the contrary in fact.

Now, back to the issue at hand. It is obvious that gnubg uses some kind of volatility estimate, even if only implicitly, to decide when to double at 0-ply. A cubeless equity or breakdown by itself is not enough to make a cube decision. I'm not sure how gnubg does it, I thought it was part of the Janoski's cubeful adjustments. That's why I asked how it was done.

> > Yet I do not think the situation is simple. race is
> > simpler than contact, still someone may make great improvements if she
> > is willing to do the proper research.
>
> Yes and I think Christian made a good start, suggesting strongly that 0-ply
> doubles too early.

And people been saying for years gnubg plays badly at "close to race"
positions, yet when pressed where not even able to define a criteria
for categorizing them.

I don't see how that is relevant to this discussion. I'm just looking at Christian's data and I'm thankful to him for presenting it here.

(...)

I think you fail to understand the basics of the problem. The method
use to train gnubg works great for cubeless evaluations since you have
a very firm starting point, and that is the bearoff which can be
solved quite accurately by brute force. This is not the case for
cubeless evaluations. If you want to do the same you have to start by
doing the same thing - i.e. build a base you can be sure off. Only
then you can start incremental improvements which are based first on
rollouts, then possibly on higher plys. Without such a base you have
nothing to stand on. A net based on random 0ply moves will generate
random 2ply moves, not matter how much you wish it to be otherwise.

I think you're looking at a completely different idea than I am. I was not discussing a new neural net (possibly based on cubeful rather than cubeless play) and how to build it and train it. Rather, I'm looking at how to possibly improve the way gnubg 0-ply's evaluations make cube decisions.

A cube decision made at 0-ply is basically a combination of cubeless equity and 2-ply volatility. Since gnubg doesn't look ahead at 0-ply it can't truly determine the 2-ply volatility of the position so it must be using some formula instead, probably implicitly defined in the Janowski cubeless to cubeful adjustments.

Another way of looking at it: 0-ply comes up with a cubeless breakdown; the associated cubeless equity is easy to determine. But to make a cube decision, gnubg comes up with different adjustments to the equity based on center cube vs. opponent owns cube, or players owns cube vs. opponent owns cube.

If it is indeed true that 0-ply doubles too early on average, then it seems like these adjustments need to be fine tuned. Not through the neural net, but probably some parameters in this cubeless to cubeful adjustment. That's the idea I'm trying to discuss here.

So, it might be that the adjustments give too little weight to owning the cube, therefore coming up with too low cubeful equities for "no double". It could also be that the adjustment gives too little weight to opponent owning the cube (assuming less efficient redoubles than there will be in practice), thereby coming up with a too high equity for doubler after (re)double/take.

To test this, I think a good way might be to get a large sample of somewhat close 0-ply cube decisions (no double versus double) and roll them out at appropriate settings.

Then a comparison between the breakdowns of any rollout and its initial 0-ply evaluation should be made, to see if there is a bias. With such a bias, things would be harder to analyze but it should be possible I think. Hopefully, there is no significant bias. In that case, we can compare 0-ply cube decisions to the results of the rollouts and determine with reasonable accuracy whether 0-ply's cube decisions are biased towards doubling too quickly (as has been suggested) or perhaps too late. If there's any such bias, then fine-tuning the cubeless to cubeful adjustment should help to make 0-ply's cube decisions better. This could be tested by re-evaluating the same sample with 0-ply evaluation with a different cubeless to cubeful adjustment, and see if the evaluations now lead to more identical cube decisions as the original rollouts.

That's just a basic procedure I come up with, I'm sure many improvements and refinements are possible or perhaps a different approach is more viable.

> > In addition I am
> > not sure I agree with the doubling code in gnubg. I always used my own
> > code which is part of the fibs2html or gnubg-nn, which I think is
> > better (but I may be wrong). If someone want to take this code and
> > integrate it into gnubg, where one can choose which method to use
> > would be a great start as well.
 
My code is based on Tom Keith ideas in "How to Compute a Match Equity
Table " (  http://www.bkgm.com/articles/met.html)  and "Match Play
Doubling Strategy" ( http://www.bkgm.com/articles/mpd.html).

Those articles seem to be based on match play specifically, I couldn't find anything there about how to make a double decision based on just a cubeless w/g/bg breakdown and perhaps a position type characteristic, which is essentially what 0-ply tries to do and which is what I'm looking at.

So as an example, gnubg 0-ply queries the neural net and gets 68% wins, 16% gammon wins, 32% losses, 2% gammon  losses (cubeless equity: 0.50).  Does it double a center cube for money?

As I understand it, gnubg uses formulae to approximate the cubeful equities, one for center cube and one for opponent owns cube. There must be some parameters in this adjustment that could perhaps be fine-tuned. That's the idea I'm suggesting.

Greetings,
--
Robert-Jan Veldhuizen
reply via email to

[Prev in Thread] Current Thread [Next in Thread]