[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question
From: |
Øystein Johansen |
Subject: |
Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question |
Date: |
Thu, 21 May 2009 10:18:57 +0200 |
User-agent: |
Thunderbird 2.0.0.21 (Windows/20090302) |
boomslang wrote:
> Hi all,
>
> I have a question regarding TD(lambda) training by Tesauro (see
> http://www.research.ibm.com/massive/tdl.html#h2:learning_methodology).
>
> The formula for adapting the weights of the neural net is
>
> w(t+1)-w(t) = a * [Y(t+1)-Y(t)] * sum(lambda^(t-k) * nabla(w)Y(k);
> k=1..t).
>
> I would like to know if nabla(w)Y(k) in the formula above is the
> gradient of Y(k) to the weights of the net at time t (i.e. the
> current net) or to the weights of the net at time k. I assume the
> former.
That really doesn't matter much, I believe. I guess, as you that it is
the former. You can check this with Sutton/Barto I guess.
However: This equation was never implemented in gnubg! All TD training
that was done in gnubg, (and that's a long time ago and abandoned at an
early stage), was done with lambda = 0. Notice how lambda = 0 simplifies
the equation. There will only be one term -- when t = k. This simplifies
the implementation to only take into account the previous position when
updating the weights. Can be simply solved with backprop.
Our experience is: TD is nice for kickstarting the training process. But
supervised training is the real thing. Make a big database of positions
and the rollout results according to these positions and train supervised.
If you still would like to do TD training with your system, I really
recommend looking at Sutton/Barto.
Good luck!
-Øystein
signature.asc
Description: OpenPGP digital signature