[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] The race training and benchmark datasets
From: |
Philippe Michel |
Subject: |
Re: [Bug-gnubg] The race training and benchmark datasets |
Date: |
Sun, 9 Jun 2019 23:37:59 +0200 |
User-agent: |
Mutt/1.12.0 (2019-05-25) |
On Fri, Jun 07, 2019 at 08:30:10PM +0200, Øystein Schønning-Johansen wrote:
> (Of course I remove any position duplicated in the two datasets, such that
> the training and validation set are disjoint.)
Is it really important (in general) ? I know one shouldn't use the same
dataset but is some limited random overlap really an issue ? I didn't
verify how limited it is in the case of gnubg's databases, though...
> I train a neural network. If I validate the training with a 10% fraction of
> the training dataset itself, I get a MSE error of about 1.0e-04. But if I
> validate against the dataset generated from train.bm-1.00.bz2 I get an MSE
> error of 7e-04. About 7 times higher!
>
> This makes me believe that the rolled out positions in the race-train-data
> file is rolled out in an other way (different tool, different settings,
> different neural net?) than the positions in train.bm-1.00.bz2.
Different tool and different neural net.
For the benchmark databases it is recorded as a comment at the beginning
of the file :
s version 1.93 weights 1.00 moves2plyLimit 20 rolloutLimit 5 nRollOutGames 1296
cubeAway 7 include0Ply 1 evalPlies 2 shortCuts 1 osrGames 1296 osrInRoll 1
This is version 1.93 of the sagnubg tool, using the 1.OO weights file
(the current one). I rerolled the benchmark databases with it after the
new weights file was generated.
The training database was rolled out with a slightly modified gnubg
(merely to have gnubg -t print the rollout results in the right format).
This was done with earlier weights. I didn't kept notes but I think I
used one intermediate weights set for the race and possibly more than
one for the crashed net (rollout the training database with the 0.90
net, train a new net, reroll the training database with it, etc...). For
the contact net I'm not sure.
In any case, this was with different weights than the current benchmark
database.
> Joseph? Philippe? Ian? Others? Do you know how these data where generated?
> Is it maybe worth rolling these positions out again? I do remember that
> Joseph made a separate rollout tool, but I'm not sure what Philippe did?
It is likely the different errors you got have another cause : as far as
I can see,the sagnubg tool used for creating the benchmark databases
doesn't use variance reduction.
That should be enough of a reason to seriously consider rerolling them,
but we would have to implement variance reduction in sagnubg first or
use gnubg with some substantial pre- and post-processing.
> (I also remember that the original benchmark was move based, and it
> calculates the loss based on incorrect moves picked, and that it might not
> be that interesting if the rollout values are abit wrong....)
I'm afraid they may not be just a bit wrong. It seems the standard
deviation of a 1296 trials rollout without variance reduction is larger
than the vast majority of the "errors" found when running the benchmark.
- [Bug-gnubg] The race training and benchmark datasets, Øystein Schønning-Johansen, 2019/06/07
- Re: [Bug-gnubg] The race training and benchmark datasets,
Philippe Michel <=
- Re: [Bug-gnubg] The race training and benchmark datasets, Øystein Schønning-Johansen, 2019/06/10
- Re: [Bug-gnubg] The race training and benchmark datasets, Philippe Michel, 2019/06/16
- Re: [Bug-gnubg] The race training and benchmark datasets, Øystein Schønning-Johansen, 2019/06/17
- Re: [Bug-gnubg] The race training and benchmark datasets, Joseph Heled, 2019/06/17
- Re: [Bug-gnubg] The race training and benchmark datasets, Joseph Heled, 2019/06/17
- Re: [Bug-gnubg] The race training and benchmark datasets, Øystein Schønning-Johansen, 2019/06/17