Home

Welcome to the Policy-Gradient-Agent wiki!

Milestone:

✅-25.11.2020 get a basic knowledge of policy gradient (read blogs/ papers/ taking courses) [2 weeks]

✅24.11.2020-01.12.2020 understand the algorithm of the chosen policy gradient method [1 week]

01.12.2020-08.12.2020 the foundation of the implementation of the policy gradient method • get familiar with the framework • understand algorithm pseudocode in the paper) [1 week]

08.12.2020-12.01.2020 • implement policy gradient SRVR-PG from scratch • apply it to a simple game for testing • fix bugs by referring source code on Github (at least 3 weeks) [5 weeks]

12.01.2020-26.01.2021 prepare presentation and demo video[2 weeks] • 10-minute presentation • Outline problem + solution • Difficulties you encountered + how you solved them • Future work • Short demo video of the implementation

26.01.2021-05.02.2021 code documentation [1 weeks]

20.02.2021-23.02.2021 [3 days] a final report using ACL latex template： • Meet requirements detailed in the project description • Learn RL basics and how it can be applied to dialog policy • understand the basic idea of policy gradient RL methods and one particular algorithm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally