require_grad is False when actor_grad is dynamic #12

mrsamsami · 2023-03-14T19:07:26Z

When running the code on DMC, because the actor_grad is dynamics; therefore, loss_policy would be -value_target. value_target is not dependent on the actor's policy distribution, and so, loss_policy does not have any gradient flowing through it with respect to the actor's parameters.
The assertion will be assert (False and True) or not True, since loss_policy does not require gradients. Therefore, the assertion becomes False.
How can we fix it?

The text was updated successfully, but these errors were encountered:

bilkitty · 2023-04-13T20:43:13Z

I could be completely wrong, but is it possible that the assert is hard-coded for the case when actor_grad == "reinforce"?

artemZholus · 2023-04-17T16:58:22Z

Same problem here!

jurgisp · 2023-04-19T18:48:19Z

You are right, it is very likely that assertion was written assuming actor_grad=reinforce. What if you simply remove it, does it work then?

To be honest, I did way less testing with actor_grad=dynamics. The functionality did work at one point and was tested with DMC, but something could have changed since then.

bilkitty · 2023-04-19T23:29:49Z

Yes, ignoring it works. However, I'm still a little confused about how the dynamics back-prop works given the non-diff value target. If we scope out the entropy loss, can you clarify how, in the code, the actor's parameters are updated?

aagha6 · 2024-07-22T12:04:27Z

If you use the reinforce policy gradient then you don't back-prop through the dynamics anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

require_grad is False when actor_grad is dynamic #12

require_grad is False when actor_grad is dynamic #12

mrsamsami commented Mar 14, 2023

bilkitty commented Apr 13, 2023

artemZholus commented Apr 17, 2023

jurgisp commented Apr 19, 2023

bilkitty commented Apr 19, 2023

aagha6 commented Jul 22, 2024

require_grad is False when actor_grad is dynamic #12

require_grad is False when actor_grad is dynamic #12

Comments

mrsamsami commented Mar 14, 2023

bilkitty commented Apr 13, 2023

artemZholus commented Apr 17, 2023

jurgisp commented Apr 19, 2023

bilkitty commented Apr 19, 2023

aagha6 commented Jul 22, 2024