MDP的curriculums部分

文章目录

1. isaaclab中的curriculums
- 1.1 modify_reward_weight
- - 1.1.1 函数功能
  - 1.1.2 参数详解
  - 1.1.3 函数逻辑
  - 1.1.4 如何使用
2. isaaclab_task中的curriculums
- 2.1 terrain_levels_vel
- - 2.1 功能概述
  - 2.2 函数参数
  - 2.3 函数逻辑
3. robot_lab中的curriculums
- 3.1 command_levels_vel

1. isaaclab中的curriculums

路径：IsaacLab/source/isaaclab/isaaclab/envs/mdp/curriculums.py

1.1 modify_reward_weight

# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause"""Common functions that can be used to create curriculum for the learning environment.The functions can be passed to the :class:`isaaclab.managers.CurriculumTermCfg` object to enable
the curriculum introduced by the function.
"""from __future__ import annotationsfrom collections.abc import Sequence
from typing import TYPE_CHECKINGif TYPE_CHECKING:from isaaclab.envs import ManagerBasedRLEnvdef modify_reward_weight(env: ManagerBasedRLEnv, env_ids: Sequence[int], term_name: str, weight: float, num_steps: int):"""Curriculum that modifies a reward weight a given number of steps.Args:env: The learning environment.env_ids: Not used since all environments are affected.term_name: The name of the reward term.weight: The weight of the reward term.num_steps: The number of steps after which the change should be applied."""if env.common_step_counter > num_steps: # 检查当前全局步数是否超过设定的阈值# obtain term settings 通过环境中的奖励管理器获取特定奖励项的配置对象term_cfg = env.reward_manager.get_term_cfg(term_name) # update term settingsterm_cfg.weight = weight # 直接修改配置对象的权重属性env.reward_manager.set_term_cfg(term_name, term_cfg) #将修改后的配置重新设置到奖励管理器中

1.1.1 函数功能

这是一个用于动态调整强化学习环境中特定奖励项权重的课程学习函数。该函数允许在训练过程中根据训练进度（步数）动态修改奖励函数中特定奖励项的权重值。

1.1.2 参数详解

env: ManagerBasedRLEnv当前的强化学习环境对象实例提供对奖励管理器和其他环境状态的访问env_ids: Sequence[int]需要应用此修改的环境ID序列当前实现中此参数未被使用，修改会应用到所有环境term_name: str关键参数：需要调整权重的奖励项的名称必须与奖励管理器配置中的项名匹配weight: float关键参数：要设置的新权重值可以是任何浮点数（正数表示奖励，负数表示惩罚）num_steps: int关键参数：触发此权重修改的步数阈值当环境的全局步数超过此值时应用修改

1.1.3 函数逻辑

在这里插入图片描述

1.1.4 如何使用

在环境配置中定义课程
你需要在环境配置文件中创建一个 CurriculumCfg 类，并使用 CurriculumTermCfg（别名为 CurrTerm）来配置课程项：

from isaaclab.managers import CurriculumTermCfg as CurrTerm
import isaaclab.envs.mdp as mdp@configclass
class CurriculumCfg:"""Curriculum terms for the MDP."""# 示例：在4500步后将action_rate奖励权重从-0.0001改为-0.005action_rate = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "action_rate",    # 要修改的奖励项名称"weight": -0.005,             # 新的权重值"num_steps": 4500             # 在第4500步后生效})# 另一个示例：修改关节速度惩罚joint_vel = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "joint_vel", "weight": -0.001, "num_steps": 4500})

在主环境配置中包含课程配置

@configclass
class MyEnvCfg(ManagerBasedRLEnvCfg):"""你的环境配置"""# 其他配置...rewards: RewardsCfg = RewardsCfg()curriculum: CurriculumCfg = CurriculumCfg()  # 添加课程配置

确保奖励项存在

@configclass
class RewardsCfg:"""Reward terms for the MDP."""# 这些奖励项必须存在，才能被课程修改action_rate = RewTerm(func=mdp.action_rate_l2, weight=-0.0001)joint_vel = RewTerm(func=mdp.joint_vel_l2,weight=-0.0001,params={"asset_cfg": SceneEntityCfg("robot")},)

2. isaaclab_task中的curriculums

2.1 terrain_levels_vel

路径：IsaacLab/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/mdp/curriculums.py

# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause"""Common functions that can be used to create curriculum for the learning environment.The functions can be passed to the :class:`isaaclab.managers.CurriculumTermCfg` object to enable
the curriculum introduced by the function.
"""from __future__ import annotationsimport torch
from collections.abc import Sequence
from typing import TYPE_CHECKINGfrom isaaclab.assets import Articulation
from isaaclab.managers import SceneEntityCfg
from isaaclab.terrains import TerrainImporterif TYPE_CHECKING:from isaaclab.envs import ManagerBasedRLEnvdef terrain_levels_vel(env: ManagerBasedRLEnv, env_ids: Sequence[int], asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")
) -> torch.Tensor:"""Curriculum based on the distance the robot walked when commanded to move at a desired velocity.This term is used to increase the difficulty of the terrain when the robot walks far enough and decrease thedifficulty when the robot walks less than half of the distance required by the commanded velocity... note::It is only possible to use this term with the terrain type ``generator``. For further informationon different terrain types, check the :class:`isaaclab.terrains.TerrainImporter` class.Returns:The mean terrain level for the given environment ids."""# extract the used quantities (to enable type-hinting)asset: Articulation = env.scene[asset_cfg.name] # 机器人对象terrain: TerrainImporter = env.scene.terrain # 地形对象command = env.command_manager.get_command("base_velocity") # 速度命令# compute the distance the robot walked# 从环境初始位置到当前位置的水平距离 (X-Y平面)distance = torch.norm(asset.data.root_pos_w[env_ids, :2] - env.scene.env_origins[env_ids, :2], dim=1)# robots that walked far enough progress to harder terrainsmove_up = distance > terrain.cfg.terrain_generator.size[0] / 2# robots that walked less than half of their required distance go to simpler terrainsmove_down = distance < torch.norm(command[env_ids, :2], dim=1) * env.max_episode_length_s * 0.5move_down *= ~move_up# update terrain levelsterrain.update_env_origins(env_ids, move_up, move_down)# return the mean terrain levelreturn torch.mean(terrain.terrain_levels.float())

2.1 功能概述

这是一个基于机器人行走距离的地形难度自适应课程学习策略函数。它根据机器人在给定速度命令下实际行走的距离，自动调整训练环境中的地形难度级别。

2.2 函数参数

env: ManagerBasedRLEnv强化学习环境实例提供访问场景对象、命令系统和其他环境状态env_ids: Sequence[int]需要处理的环境ID序列允许针对特定子集环境应用课程学习asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")场景实体配置（默认为"robot"）指定课程学习作用的机器人对象

2.3 函数逻辑

在这里插入图片描述

3. robot_lab中的curriculums

3.1 command_levels_vel

# Copyright (c) 2024-2025 Ziqi Fan
# SPDX-License-Identifier: Apache-2.0"""
机器人学习环境课程学习模块该模块包含了用于创建课程学习的通用函数。课程学习是一种渐进式训练方法，
通过逐步增加任务难度来提高机器人的学习效率和最终性能。主要功能：
1. 基于奖励的自适应课程调整
2. 速度命令范围的动态扩展
3. 任务难度的渐进式增加这些函数可以传递给 :class:`isaaclab.managers.CurriculumTermCfg` 对象，
以启用相应的课程学习功能。
"""from __future__ import annotationsimport torch
from collections.abc import Sequence
from typing import TYPE_CHECKINGif TYPE_CHECKING:from isaaclab.envs import ManagerBasedRLEnvdef command_levels_vel(env: ManagerBasedRLEnv, env_ids: Sequence[int], reward_term_name: str, max_curriculum: float = 1.0
) -> None:"""基于机器人速度跟踪奖励的课程学习函数该函数根据机器人在执行指定速度命令时的跟踪奖励来调整课程难度。当机器人的跟踪奖励超过最大值的80%时，会增加命令的范围，从而逐步提高任务的难度。Args:env: 强化学习环境对象env_ids: 需要更新课程的环境ID序列reward_term_name: 用于评估的奖励项名称max_curriculum: 课程学习的最大难度值，默认为1.0Returns:float: 线性速度命令范围的累积增量工作原理：1. 获取指定奖励项的累积值2. 计算平均奖励与最大奖励的比值3. 如果比值超过80%，则扩展速度命令范围4. 返回当前的速度增量值课程策略：- 初始阶段：较小的速度命令范围，便于机器人学习基础运动- 进阶阶段：随着性能提升，逐步扩大命令范围- 最终阶段：达到最大课程难度，测试机器人的极限性能"""# 获取指定奖励项的累积值episode_sums = env.reward_manager._episode_sums[reward_term_name]# 获取奖励项的配置信息reward_term_cfg = env.reward_manager.get_term_cfg(reward_term_name)# 获取基础速度命令的范围配置base_velocity_ranges = env.command_manager.get_term("base_velocity").cfg.ranges# 定义速度范围的增量值（对称增加）delta_range = torch.tensor([-0.1, 0.1], device=env.device)# 初始化线性速度增量（如果不存在）if not hasattr(env, "delta_lin_vel"):env.delta_lin_vel = torch.tensor(0.0, device=env.device)# 判断是否需要增加课程难度# 条件：平均奖励超过最大奖励的80%if torch.mean(episode_sums[env_ids]) / env.max_episode_length > 0.8 * reward_term_cfg.weight:# 获取当前的线性速度范围lin_vel_x = torch.tensor(base_velocity_ranges.lin_vel_x, device=env.device)lin_vel_y = torch.tensor(base_velocity_ranges.lin_vel_y, device=env.device)# 扩展X方向线性速度范围，并限制在最大课程范围内base_velocity_ranges.lin_vel_x = torch.clamp(lin_vel_x + delta_range, -max_curriculum, max_curriculum).tolist()# 扩展Y方向线性速度范围，并限制在最大课程范围内base_velocity_ranges.lin_vel_y = torch.clamp(lin_vel_y + delta_range, -max_curriculum, max_curriculum).tolist()# 更新累积的线性速度增量env.delta_lin_vel = torch.clamp(env.delta_lin_vel + delta_range[1], 0.0, max_curriculum)# 返回当前的速度增量值，用于监控课程进度return env.delta_lin_vel