[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] Some idle musings re. ratings
From: |
kvandoel |
Subject: |
Re: [Bug-gnubg] Some idle musings re. ratings |
Date: |
Thu, 11 Sep 2003 18:57:22 +0200 (CEST) |
On Thu, 11 Sep 2003, Joern Thyssen wrote:
> > This is not a realistic model in that no-one ever plays with a
> > consistent MWC, but I was wondering if we could use the luck adjusted
> > result as an indicator of MWC to model FIBS ratings and to compare them
> > to the current function which Kees van Doel generated.
>
> I think that is exactly what what Kees did.
>
> He calculated relative rating estimates of gnubg 0-ply versus gnubg
> 0-ply with noise and found a relationship between error rates and the
> relative rating estimates.
That's an excellent summary. I'm glad at least one person has read my
writeup!
> Rethinking, I would have prefered to relate the error rates and MWCs,
> and then apply the FIBS rating formulae afterwards.
Actually that IS what I did, I must have forgotten to mention it. I will
correct that. It is impossible to work with the relative rating
directly as it is undefined quite often when the luck adjusted
prob. estimates are negative.
So I do all the averaging on the luck adjusted results, including the
neg. probability estimates (wouldn't it be better to clamp <0 and >1
probabilities to 0 and 1 is a question I've pondered.., any ideas?). I
also get a variance and confidence interval from this. Then, at the very
end, I translate the estimated MWC and confidence interval into an ELO
interval.
> One of the issues raised by Douglas is if the experiment of 0-ply versus
> 0-ply noise really represents a good model of human play.
My working assumtion is that it is. So far data on real people shows
that is usually is but there is an outlier: Mr Albert Silver. I'd like
to analyse more human matches but I don't think anyone is going to help
me getting them except Albert, so I'll probably drop the ball on this at
some point.
> Also, what relationship do we expect between error rates and lost MWC?
> My guess would be linear for some fixed match length, but it depends on
> if you make the errors in the beginning or towards the end of a match.
> Using 0-ply with noise will roughly give constant error rates throughout
> the match, so I'd expect to see a linear relationship between error
> rates and lost MWC for a fixed match length. I'm not sure if this holds
> for human play -- I guess you would need to investigate a large number
> of human matches.
Yes, as you can see from the data at
http://www.cs.ubc.ca/~kvdoel/tmp/ratings
the rating data looks linear.
> The FIBs rating formula is:
>
> D = -2000/sqrt(n) * log10( 1/p - 1 )
>
> For values of p around 0.5 the rating difference D is close to linear in
> p, so we would get a linear relationship between D and error rates --
> exactly that Kees found.
> Kees uses a a/N+b extrapolation formulae for the match length. I would
> guess from the FIBS rating formulae that a/sqrt(N)+b would be better.
Note that this formula also popped clearly out of the data, I didn't
arbitrarily chose that extrapolation. That power is really 1, not 1/2,
and one of the conclusions you can draw from these results is that the
FIBS formula is indeed flawed (i.e., your rating depends on the
matchlenght you play: if you play short matches agains weaker players
and long ones against stronger players you get a higher rating than the
other way around. Quite substantially higher if you believe my data.
Regarding modeling human error by noise:
One thing I would expect is the noise errors to be uniformly distributed
over the moves of a match, whereas the human errors would tend to clump
together when a type of position arises that is incorrectly handled by
the human. For myself, when I get to a difficult position (in the human
sense) my errors clump in that region, because it is more difficult.
I don't see however how that (clumping versus uniformity) would affect
the rating.
I understand the noise is injected in the outputs of the NN. I always
have had the feeling that it would be a better model of human error to
inject the noise into the WEIGHTS of the NN. Now that I think about it I
think this might also introduce clumping effects, like when the position
moves into a region whose processing has been damaged a lot by the
partial lobotomy.
I guess I could just externally disturb the weights file to create a
number of braindamaged bots and experiment with those. Any pointers to
where I can find the file format for the wieghts files? Or is this a
stupid idea anyways?
Kees
- [Bug-gnubg] Some idle musings re. ratings, Jim Segrave, 2003/09/10
- Re: [Bug-gnubg] Some idle musings re. ratings, Joern Thyssen, 2003/09/11
- Re: [Bug-gnubg] Some idle musings re. ratings,
kvandoel <=
- RE: [Bug-gnubg] Some idle musings re. ratings, David Montgomery, 2003/09/11
- Re: [Bug-gnubg] Some idle musings re. ratings, Jim Segrave, 2003/09/11
- Re: [Bug-gnubg] Some idle musings re. ratings, Joseph Heled, 2003/09/11
- Re: [Bug-gnubg] Some idle musings re. ratings, kvandoel, 2003/09/11
- Re: [Bug-gnubg] Some idle musings re. ratings, kvandoel, 2003/09/11
- Re: [Bug-gnubg] Some idle musings re. ratings, Joern Thyssen, 2003/09/12
- Re: [Bug-gnubg] Some idle musings re. ratings, kvandoel, 2003/09/12
- Re: [Bug-gnubg] Some idle musings re. ratings, Nardy Pillads, 2003/09/15