Skip to content

Conversation

@yhnsu
Copy link
Collaborator

@yhnsu yhnsu commented Jan 16, 2026

Description

This document summarizes the usage of RewardManager and ObservationManager in our RL training pipeline, following a simple and practical format for contributors.


Summary of Change

  • Clarifies how RewardManager and ObservationManager are integrated into RL training.
  • Provides motivation/context for modular reward and observation handling.
  • No new dependencies required.

Motivation & Context

  • RewardManager and ObservationManager modularize reward and observation logic for RL tasks.
  • They allow flexible configuration via JSON, making it easy to add, remove, or tune reward/observation terms without changing code.
  • This design supports rapid experimentation and reproducibility in RL research.

Usage in RL Training

RewardManager

  • Reads reward configuration from the environment config (e.g., gym_config.json).
  • Each reward term is defined by a function, weight, and parameters.
  • During each RL step, RewardManager computes all active reward terms and combines them (weighted sum or replace mode).
  • Individual reward components are logged for analysis (e.g., via wandb/tensorboard).
  • Example config:
    "rewards": {
      "distance_reward": {
        "func": "distance_to_target",
        "mode": "add",
        "weight": 0.5,
        "params": {"source_entity_cfg": {"uid": "cube"}, "target_pose_key": "goal_pose"}
      },
      "success_bonus": {
        "func": "success_reward",
        "mode": "add",
        "weight": 10.0,
        "params": {}
      }
    }
  • RL trainer logs both total reward and each component for debugging and analysis.

ObservationManager

  • Reads observation configuration from the environment config.
  • Each observation term is defined by a function and parameters.
  • During each RL step, ObservationManager collects and processes all active observation terms, producing the final observation dict.
  • Observations are flattened for RL algorithms and logged if needed.
  • Example config:
    "observations": {
      "cube_pos": {
        "func": "get_cube_position",
        "mode": "add",
        "params": {"entity_cfg": {"uid": "cube"}}
      },
      "robot_state": {
        "func": "get_robot_state",
        "mode": "add",
        "params": {"entity_cfg": {"uid": "Manipulator"}}
      }
    }
  • RL trainer uses the flattened observation for policy and buffer.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a reward manager system for reinforcement learning training, enabling modular reward computation through configurable reward functors. The reward manager follows the same design pattern as the existing observation manager and event manager.

Changes:

  • Added RewardManager class to orchestrate reward computation with support for multiple weighted reward terms
  • Implemented 11 reusable reward functions covering distance-based rewards, penalties, and success bonuses
  • Integrated reward manager into EmbodiedEnv and BaseEnv for automatic reward computation
  • Refactored PushCubeEnv to use the reward manager instead of manual reward calculation
  • Added randomize_target_pose function for virtual goal poses without physical scene objects

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
embodichain/lab/gym/envs/managers/reward_manager.py New reward manager class for orchestrating reward computation
embodichain/lab/gym/envs/managers/rewards.py New module with 11 reward functor implementations
embodichain/lab/gym/envs/managers/cfg.py Added RewardCfg configuration class
embodichain/lab/gym/envs/managers/init.py Exported RewardCfg and RewardManager
embodichain/lab/gym/envs/embodied_env.py Integrated reward manager initialization and reset
embodichain/lab/gym/envs/base_env.py Added _extend_reward hook in get_reward method
embodichain/lab/gym/utils/gym_utils.py Added reward parsing logic in load_gym_cfg
embodichain/lab/gym/envs/tasks/rl/push_cube.py Refactored to use reward manager, removed manual reward code
embodichain/lab/gym/envs/managers/randomization/spatial.py Added randomize_target_pose function
embodichain/lab/gym/envs/managers/observations.py Added get_robot_ee_pose and target_position observation functions
configs/agents/rl/push_cube/gym_config.json Updated to use reward manager configuration
configs/agents/rl/push_cube/train_config.json Changed eval_freq from 2 to 200

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yuecideng yuecideng merged commit cf282c9 into main Jan 19, 2026
5 checks passed
@yuecideng yuecideng deleted the yhn/reward_manager branch January 19, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants