add reward manager #71

yhnsu · 2026-01-16T08:09:02Z

Description

This document summarizes the usage of RewardManager and ObservationManager in our RL training pipeline, following a simple and practical format for contributors.

Summary of Change

Clarifies how RewardManager and ObservationManager are integrated into RL training.
Provides motivation/context for modular reward and observation handling.
No new dependencies required.

Motivation & Context

RewardManager and ObservationManager modularize reward and observation logic for RL tasks.
They allow flexible configuration via JSON, making it easy to add, remove, or tune reward/observation terms without changing code.
This design supports rapid experimentation and reproducibility in RL research.

Usage in RL Training

RewardManager

Reads reward configuration from the environment config (e.g., gym_config.json).
Each reward term is defined by a function, weight, and parameters.
During each RL step, RewardManager computes all active reward terms and combines them (weighted sum or replace mode).
Individual reward components are logged for analysis (e.g., via wandb/tensorboard).

Example config:

"rewards": {
  "distance_reward": {
    "func": "distance_to_target",
    "mode": "add",
    "weight": 0.5,
    "params": {"source_entity_cfg": {"uid": "cube"}, "target_pose_key": "goal_pose"}
  },
  "success_bonus": {
    "func": "success_reward",
    "mode": "add",
    "weight": 10.0,
    "params": {}
  }
}

RL trainer logs both total reward and each component for debugging and analysis.

ObservationManager

Reads observation configuration from the environment config.
Each observation term is defined by a function and parameters.
During each RL step, ObservationManager collects and processes all active observation terms, producing the final observation dict.
Observations are flattened for RL algorithms and logged if needed.

Example config:

"observations": {
  "cube_pos": {
    "func": "get_cube_position",
    "mode": "add",
    "params": {"entity_cfg": {"uid": "cube"}}
  },
  "robot_state": {
    "func": "get_robot_state",
    "mode": "add",
    "params": {"entity_cfg": {"uid": "Manipulator"}}
  }
}

RL trainer uses the flattened observation for policy and buffer.

Copilot

Pull request overview

This PR introduces a reward manager system for reinforcement learning training, enabling modular reward computation through configurable reward functors. The reward manager follows the same design pattern as the existing observation manager and event manager.

Changes:

Added RewardManager class to orchestrate reward computation with support for multiple weighted reward terms
Implemented 11 reusable reward functions covering distance-based rewards, penalties, and success bonuses
Integrated reward manager into EmbodiedEnv and BaseEnv for automatic reward computation
Refactored PushCubeEnv to use the reward manager instead of manual reward calculation
Added randomize_target_pose function for virtual goal poses without physical scene objects

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
embodichain/lab/gym/envs/managers/reward_manager.py	New reward manager class for orchestrating reward computation
embodichain/lab/gym/envs/managers/rewards.py	New module with 11 reward functor implementations
embodichain/lab/gym/envs/managers/cfg.py	Added RewardCfg configuration class
embodichain/lab/gym/envs/managers/init.py	Exported RewardCfg and RewardManager
embodichain/lab/gym/envs/embodied_env.py	Integrated reward manager initialization and reset
embodichain/lab/gym/envs/base_env.py	Added _extend_reward hook in get_reward method
embodichain/lab/gym/utils/gym_utils.py	Added reward parsing logic in load_gym_cfg
embodichain/lab/gym/envs/tasks/rl/push_cube.py	Refactored to use reward manager, removed manual reward code
embodichain/lab/gym/envs/managers/randomization/spatial.py	Added randomize_target_pose function
embodichain/lab/gym/envs/managers/observations.py	Added get_robot_ee_pose and target_position observation functions
configs/agents/rl/push_cube/gym_config.json	Updated to use reward manager configuration
configs/agents/rl/push_cube/train_config.json	Changed eval_freq from 2 to 200

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

configs/agents/rl/push_cube/gym_config.json

embodichain/lab/gym/envs/managers/observations.py

embodichain/lab/gym/envs/managers/reward_manager.py

embodichain/lab/gym/envs/managers/rewards.py

embodichain/lab/gym/envs/tasks/rl/push_cube.py

embodichain/lab/gym/envs/managers/rewards.py

embodichain/lab/gym/utils/gym_utils.py

embodichain/lab/gym/envs/embodied_env.py

embodichain/lab/gym/envs/managers/observations.py

embodichain/lab/gym/envs/managers/rewards.py

configs/agents/rl/push_cube/gym_config.json

Co-authored-by: chenjian <chenjian@dexforce.com>

yuanhaonan added 2 commits January 16, 2026 14:24

add reward manager

7eeaeab

fix obs

f2a1d19

yuecideng requested a review from Copilot January 16, 2026 08:19

Copilot started reviewing on behalf of yuecideng January 16, 2026 08:19 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

yuecideng requested changes Jan 17, 2026

View reviewed changes

yuanhaonan and others added 8 commits January 19, 2026 11:28

fix obs_dim with flatten dict input

548c1b1

update review changes

4e66b14

delete RLenvCfg

bea6d5e

Merge branch 'main' into yhn/reward_manager

0ec4b79

update review changes

69241fb

multiple simulation manager (#74)

f05a1eb

Co-authored-by: chenjian <chenjian@dexforce.com>

wip

16004fd

wip

ba6ea10

yuecideng approved these changes Jan 19, 2026

View reviewed changes

yuecideng added 2 commits January 19, 2026 22:45

wip

4e65bb9

wip

a2f7f2e

yuecideng merged commit cf282c9 into main Jan 19, 2026
5 checks passed

yuecideng deleted the yhn/reward_manager branch January 19, 2026 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add reward manager #71

add reward manager #71

Uh oh!

yhnsu commented Jan 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add reward manager #71

add reward manager #71

Uh oh!

Conversation

yhnsu commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of Change

Motivation & Context

Usage in RL Training

RewardManager

ObservationManager

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yhnsu commented Jan 16, 2026 •

edited

Loading