Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question

From:	Øystein Johansen
Subject:	Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question
Date:	Thu, 21 May 2009 10:18:57 +0200
User-agent:	Thunderbird 2.0.0.21 (Windows/20090302)

boomslang wrote:
> Hi all,
> 
> I have a question regarding TD(lambda) training by Tesauro (see
> http://www.research.ibm.com/massive/tdl.html#h2:learning_methodology).
> 
> The formula for adapting the weights of the neural net is
> 
> w(t+1)-w(t) = a * [Y(t+1)-Y(t)] * sum(lambda^(t-k) * nabla(w)Y(k);
> k=1..t).
> 
> I would like to know if nabla(w)Y(k) in the formula above is the
> gradient of Y(k) to the weights of the net at time t (i.e. the
> current net) or to the weights of the net at time k.  I assume the
> former.

That really doesn't matter much, I believe. I guess, as you that it is
the former. You can check this with Sutton/Barto I guess.

However: This equation was never implemented in gnubg! All TD training
that was done in gnubg, (and that's a long time ago and abandoned at an
early stage), was done with lambda = 0. Notice how lambda = 0 simplifies
the equation. There will only be one term -- when t = k. This simplifies
 the implementation to only take into account the previous position when
updating the weights. Can be simply solved with backprop.

Our experience is: TD is nice for kickstarting the training process. But
supervised training is the real thing. Make a big database of positions
and the rollout results according to these positions and train supervised.

If you still would like to do TD training with your system, I really
recommend looking at Sutton/Barto.

Good luck!
-Øystein

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-gnubg] TD(lambda) training for neural networks -- a question, boomslang, 2009/05/20
- Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question, Øystein Johansen <=
  - RE: [Bug-gnubg] TD(lambda) training for neural networks -- a question, Ian Shaw, 2009/05/21
    - Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question, Øystein Johansen, 2009/05/21
- Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question, boomslang, 2009/05/21
- Re: [Bug-gnubg] TD(lambda) training for neural networks -- a question, Øystein Johansen, 2009/05/21

Prev by Date: [Bug-gnubg] integration with gnubg
Next by Date: Re: [Bug-gnubg] integration with gnubg
Previous by thread: [Bug-gnubg] TD(lambda) training for neural networks -- a question
Next by thread: RE: [Bug-gnubg] TD(lambda) training for neural networks -- a question
Index(es):
- Date
- Thread