完整代码参考: https://gitee.com/chencib/ailib/blob/master/rl/ppo_cartpole.py
执行结果:
部分训练得分:
(sd) D:\Dev\traditional_nn\feiai\test\rl>python ppo_cartpole_v2_succeed.py
Ep: 0 | Reward: 23.0 | Running: 23.0
Ep: 1 | Reward: 12.0 | Running: 21.9
Ep: 2 | Reward: 31.0 | Running: 22.8
Ep: 3 | Reward: 25.0 | Running: 23.0
Ep: 4 | Reward: 9.0 | Running: 21.6
Ep: 5 | Reward: 20.0 | Running: 21.5
Ep: 6 | Reward: 20.0 | Running: 21.3
Ep: 7 | Reward: 28.0 | Running: 22.0
Ep: 8 | Reward: 32.0 | Running: 23.0
Ep: 9 | Reward: 18.0 | Running: 22.5
……
Ep: 990 | Reward: 15.0 | Running: 19.7
Ep: 991 | Reward: 19.0 | Running: 19.7
Ep: 992 | Reward: 20.0 | Running: 19.7
Ep: 993 | Reward: 24.0 | Running: 20.1
Ep: 994 | Reward: 16.0 | Running: 19.7
Ep: 995 | Reward: 20.0 | Running: 19.7
Ep: 996 | Reward: 19.0 | Running: 19.7
Ep: 997 | Reward: 26.0 | Running: 20.3
Ep: 998 | Reward: 13.0 | Running: 19.6
Ep: 999 | Reward: 11.0 | Running: 18.7