欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 汽车 > 新车 > [Reinforment Learning] Epoch and timestep

[Reinforment Learning] Epoch and timestep

2025/5/18 2:32:37 来源:https://blog.csdn.net/likecayon/article/details/144617134  浏览:    关键词:[Reinforment Learning] Epoch and timestep

Preface

In the context of reinforcement learning (RL), the term “timestep” has a specific meaning that differs from “epoch.” Understanding these terms is crucial for interpreting how the RL algorithm operates and processes data.

Timestep:

Represents one discrete interaction: action → environment response (observation, reward, done signal).
Fundamental unit of experience in RL.

Epoch:

Represents a collection of timesteps, often aggregated before performing an update to the policy or value networks.
Helps in organizing the training process, especially in batch-based RL algorithms like PPO.
Why “Timestep” Matters:

Why timestep matters

RL algorithms rely on sequential data where each timestep’s outcome can influence future actions.
Tracking changes between consecutive timesteps (like delta_pitch) helps in understanding the dynamics and progression of the agent’s actions.

Epochs in RL:

While timesteps are about individual actions, epochs in RL organize these actions into manageable batches for updating the model.
For example, after collecting a certain number of timesteps, the agent may perform gradient updates to improve the policy based on the aggregated experience.

Visual Representation:

Epoch
│
├─ Timestep 1: Action A1 → Observation O1, Reward R1
├─ Timestep 2: Action A2 → Observation O2, Reward R2
├─ Timestep 3: Action A3 → Observation O3, Reward R3
│
└─ Policy Update based on collected timesteps

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

热搜词