Skip to content

Evaluation of a RSL RL policy#333

Merged
viiik-inside merged 18 commits intomainfrom
feature/rl_policy_eval
Feb 2, 2026
Merged

Evaluation of a RSL RL policy#333
viiik-inside merged 18 commits intomainfrom
feature/rl_policy_eval

Conversation

@viiik-inside
Copy link
Copy Markdown
Collaborator

Summary

RSL RL policy evaluation class

Copy link
Copy Markdown
Collaborator

@alexmillane alexmillane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for doing this!

I have some questions about the success term in the LiftObjectTask which I think that we need to sort out.

Suggestion also to add a test. Could just run the the policy_runner with the default configs that we have and check that it doesn't crash.

Comment thread isaaclab_arena/metrics/lift_success.py Outdated
Comment on lines +68 to +73
return [
LiftSuccessMetric(
minimum_height=self.minimum_height_to_lift,
object_name=self.lift_object.name,
)
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion also to return SuccessRateMetric(). Actually, all tasks should return that. Not sure of a good way of enforcing that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: looking at the code closely, this task doesn't define a success term. This is a (apparently unenforced) assumption in Arena.

What's the reason there? Is there anything preventing us from defining a success term? Right now the LiftObjectTask wont really work in an IL setting.

The second question is, if we define success, do we need LiftSuccessMetric. I.e. is SuccessRateMetric the same thing? Potentially not because here success may also entail reaching an (x,y,z) goal, rather than just z>min_height. If that's the case, LiftSuccessMetric does indeed provide more information.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a bit tricky in this case.

  1. Lift Obejct task actually uses the command term to get its goal location. Its not a certain height but a certain position in space. In the RL training when it reaches the position it waits until time out to reset.

  2. As the goal location comes from the command term, it starts in the RL problem and couldnt be included in the termination config which is in the IL problem.

  3. This particular problem is ill defined in the IL setting as such. We need an observation term and convert the problem into setting a random goal pose as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am gonna write a note here. When using a command manager, the terminations dont have a success term. Which makes sense, but how do we combine them in our setting is gonna be a different question. I would say we would need to override the termination cfg by removing the success term in the RL setting.I will implement that

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some changes here and i think its the neatest way forward to solve this issue. Ill try to explain below.

IL Base class needs success
RL class should not have success during training but needs success during policy evaluation(by our script)
RL env sets goal pose from the command manager, which the IL base class does not have access to.

Solution, have a smarter success condition. Which will return False during rl training, enforced with a flag. Will get the goal position from the command manager during rl_evaluation. Will take in the goal pose directly for IL training if command manager does not exist.

def get_events_cfg(self):
return self.events_cfg

def make_termination_cfg(self, rl_training: bool = False, use_command_goal: bool = False):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this behavior is general to all RLTasks? That the RL task termination is always (or at least usually) the IL task termination minus the success condition?

Can we push this to a function or a base class? Maybe the simplest is a function remove_success_condition(super.get_termination_cfg()). Actually I think that @peterd-NV did this for sequential tasks - he had to remove the success conditions from subtasks when forming the composite.

Comment thread isaaclab_arena/tasks/lift_object_task.py Outdated
Comment thread isaaclab_arena/tasks/lift_object_task.py
target_x_delta: Target range deltas for x [min_delta, max_delta] relative to initial pose (m).
target_y_delta: Target range deltas for y [min_delta, max_delta] relative to initial pose (m).
target_z_delta: Target range deltas for z [min_delta, max_delta] relative to initial pose (m).
rl_training_mode: If True, disables success termination. Set to False for evaluation.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the underlying (IL) task (LiftObjectTask) for evaluation, rather than passing this flag?

Comment thread isaaclab_arena/tasks/lift_object_task.py
Comment thread isaaclab_arena/tasks/lift_object_task.py Outdated
@viiik-inside viiik-inside merged commit 490b5d9 into main Feb 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants