-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the Policy-Gradient-Agent wiki!
Milestone:
✅-25.11.2020 get a basic knowledge of policy gradient (read blogs/ papers/ taking courses) [2 weeks]
✅24.11.2020-01.12.2020 understand the algorithm of the chosen policy gradient method [1 week]
01.12.2020-08.12.2020 the foundation of the implementation of the policy gradient method • get familiar with the framework • understand algorithm pseudocode in the paper) [1 week]
08.12.2020-12.01.2020 • implement policy gradient SRVR-PG from scratch • apply it to a simple game for testing • fix bugs by referring source code on Github (at least 3 weeks) [5 weeks]
12.01.2020-26.01.2021 prepare presentation and demo video[2 weeks] • 10-minute presentation • Outline problem + solution • Difficulties you encountered + how you solved them • Future work • Short demo video of the implementation
26.01.2021-05.02.2021 code documentation [1 weeks]
20.02.2021-23.02.2021 [3 days] a final report using ACL latex template: • Meet requirements detailed in the project description • Learn RL basics and how it can be applied to dialog policy • understand the basic idea of policy gradient RL methods and one particular algorithm