Skip to content
MinWang1997 edited this page Dec 4, 2020 · 1 revision

Welcome to the Policy-Gradient-Agent wiki!

Milestone:

✅-25.11.2020 get a basic knowledge of policy gradient (read blogs/ papers/ taking courses) [2 weeks]

✅24.11.2020-01.12.2020 understand the algorithm of the chosen policy gradient method [1 week]

01.12.2020-08.12.2020 the foundation of the implementation of the policy gradient method • get familiar with the framework • understand algorithm pseudocode in the paper) [1 week]

08.12.2020-12.01.2020 • implement policy gradient SRVR-PG from scratch • apply it to a simple game for testing • fix bugs by referring source code on Github (at least 3 weeks) [5 weeks]

12.01.2020-26.01.2021 prepare presentation and demo video[2 weeks] • 10-minute presentation • Outline problem + solution • Difficulties you encountered + how you solved them • Future work • Short demo video of the implementation

26.01.2021-05.02.2021 code documentation [1 weeks]

20.02.2021-23.02.2021 [3 days] a final report using ACL latex template: • Meet requirements detailed in the project description • Learn RL basics and how it can be applied to dialog policy • understand the basic idea of policy gradient RL methods and one particular algorithm

Clone this wiki locally