Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss function only contains instantaneous reward but not cumulated reward #109

Open
AchillesJJ opened this issue Oct 19, 2018 · 1 comment

Comments

@AchillesJJ
Copy link

As show in the nnagent.py, the author use average return of a batch as the loss function. However, it seems that such loss function only contains instantaneous reward, not average cumulated reward. To be specific, supposing we have a batch of experience as follows

mini_batch = $(s_t, a_t, r_t, ..., s_(t+T), a_(t+T), r_(t+T))$

@ZhengyaoJiang
Copy link
Owner

However, it seems that such loss function only contains instantaneous reward, not average cumulated reward.

If there is no commission fee, when the action won't affect the state transition, optimizing the immediate rewards is equivalent to optimizing the long-term value.
And this point, together with the differentiable reward function, gives superior sample efficiency compared with common purpose RL.
To deal with the commission fee, we treat it as a regularization term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants