On Thu 11 Sep 2003 (18:57 +0200), address@hidden wrote:
On Thu, 11 Sep 2003, Joern Thyssen wrote:
This is not a realistic model in that no-one ever plays with a
consistent MWC, but I was wondering if we could use the luck adjusted
result as an indicator of MWC to model FIBS ratings and to compare them
to the current function which Kees van Doel generated.
I think that is exactly what what Kees did.
He calculated relative rating estimates of gnubg 0-ply versus gnubg
0-ply with noise and found a relationship between error rates and the
relative rating estimates.
That's an excellent summary. I'm glad at least one person has read my
writeup!
Rethinking, I would have prefered to relate the error rates and MWCs,
and then apply the FIBS rating formulae afterwards.
Actually that IS what I did, I must have forgotten to mention it. I will
correct that. It is impossible to work with the relative rating
directly as it is undefined quite often when the luck adjusted
prob. estimates are negative.
Ok - I didn't express myself well as that's (relate the error rates
and MWCs) what I was trying to express. I did read your writeup. I was
simply playing with a very simplistic model to see how the FIBs rating
behaves in the face of consistent MWC.
One of the issues raised by Douglas is if the experiment of 0-ply versus
0-ply noise really represents a good model of human play.
My working assumtion is that it is. So far data on real people shows
that is usually is but there is an outlier: Mr Albert Silver. I'd like
to analyse more human matches but I don't think anyone is going to help
me getting them except Albert, so I'll probably drop the ball on this at
some point.
I suspect that if I played on FIBS I'd also be an outlier.
Regarding modeling human error by noise:
One thing I would expect is the noise errors to be uniformly distributed
over the moves of a match, whereas the human errors would tend to clump
together when a type of position arises that is incorrectly handled by
the human. For myself, when I get to a difficult position (in the human
sense) my errors clump in that region, because it is more difficult.
I don't see however how that (clumping versus uniformity) would affect
the rating.
I usually find when playing a 7 point match against gnubg that of the
say 6 games in the match, 4 of them will give me a consistent
low (for me) error rate and one or two games will have almost all of
the errors, not uncommonly 2 or 3 major errors within a move or two of
each other.
I understand the noise is injected in the outputs of the NN. I always
have had the feeling that it would be a better model of human error to
inject the noise into the WEIGHTS of the NN. Now that I think about it I
think this might also introduce clumping effects, like when the position
moves into a region whose processing has been damaged a lot by the
partial lobotomy.
Hmm, I sometimes think that describes my off days at bg.
I guess I could just externally disturb the weights file to create a
number of braindamaged bots and experiment with those. Any pointers to
where I can find the file format for the wieghts files? Or is this a
stupid idea anyways?
I have my doubts about this one. It would certainly be even harder to
give any justification for altering the weights as being a model of
human (mis)play, even more so that injecting random noise. And I
suspect it would be much harder to produce a controlled result.
I'd also speculate you'll have difficulty finding any general model
for real human errors - there are the ones caused by simply failing to
see a move, whether by not looking, miscounting points or
whatever. There are ones caused by not understanding a type of
game. There are ones caused by too careless play, steaming, being too
cautious and waiting too long, failing to see potential responses, you
name it, people will find a way to screw it up. And every person will
have their own mix of errors they make, which even for one person may
vary with time, alcohol, their assesment of their opponent, etc.