-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
require_grad is False when actor_grad is dynamic #12
Comments
I could be completely wrong, but is it possible that the assert is hard-coded for the case when actor_grad == "reinforce"? |
Same problem here! |
You are right, it is very likely that assertion was written assuming actor_grad=reinforce. What if you simply remove it, does it work then? To be honest, I did way less testing with actor_grad=dynamics. The functionality did work at one point and was tested with DMC, but something could have changed since then. |
Yes, ignoring it works. However, I'm still a little confused about how the dynamics back-prop works given the non-diff value target. If we scope out the entropy loss, can you clarify how, in the code, the actor's parameters are updated? |
If you use the reinforce policy gradient then you don't back-prop through the dynamics anymore. |
When running the code on DMC, because the
actor_grad
isdynamics
; therefore,loss_policy
would be-value_target
.value_target
is not dependent on the actor's policy distribution, and so,loss_policy
does not have any gradient flowing through it with respect to the actor's parameters.The assertion will be
assert (False and True) or not True
, sinceloss_policy
does not require gradients. Therefore, the assertion becomesFalse
.How can we fix it?
The text was updated successfully, but these errors were encountered: