Evaluation of a RSL RL policy#333
Conversation
There was a problem hiding this comment.
Looks great! Thanks for doing this!
I have some questions about the success term in the LiftObjectTask which I think that we need to sort out.
Suggestion also to add a test. Could just run the the policy_runner with the default configs that we have and check that it doesn't crash.
| return [ | ||
| LiftSuccessMetric( | ||
| minimum_height=self.minimum_height_to_lift, | ||
| object_name=self.lift_object.name, | ||
| ) | ||
| ] |
There was a problem hiding this comment.
Suggestion also to return SuccessRateMetric(). Actually, all tasks should return that. Not sure of a good way of enforcing that.
There was a problem hiding this comment.
Edit: looking at the code closely, this task doesn't define a success term. This is a (apparently unenforced) assumption in Arena.
What's the reason there? Is there anything preventing us from defining a success term? Right now the LiftObjectTask wont really work in an IL setting.
The second question is, if we define success, do we need LiftSuccessMetric. I.e. is SuccessRateMetric the same thing? Potentially not because here success may also entail reaching an (x,y,z) goal, rather than just z>min_height. If that's the case, LiftSuccessMetric does indeed provide more information.
There was a problem hiding this comment.
its a bit tricky in this case.
-
Lift Obejct task actually uses the command term to get its goal location. Its not a certain height but a certain position in space. In the RL training when it reaches the position it waits until time out to reset.
-
As the goal location comes from the command term, it starts in the RL problem and couldnt be included in the termination config which is in the IL problem.
-
This particular problem is ill defined in the IL setting as such. We need an observation term and convert the problem into setting a random goal pose as well.
There was a problem hiding this comment.
I am gonna write a note here. When using a command manager, the terminations dont have a success term. Which makes sense, but how do we combine them in our setting is gonna be a different question. I would say we would need to override the termination cfg by removing the success term in the RL setting.I will implement that
There was a problem hiding this comment.
I have made some changes here and i think its the neatest way forward to solve this issue. Ill try to explain below.
IL Base class needs success
RL class should not have success during training but needs success during policy evaluation(by our script)
RL env sets goal pose from the command manager, which the IL base class does not have access to.
Solution, have a smarter success condition. Which will return False during rl training, enforced with a flag. Will get the goal position from the command manager during rl_evaluation. Will take in the goal pose directly for IL training if command manager does not exist.
| def get_events_cfg(self): | ||
| return self.events_cfg | ||
|
|
||
| def make_termination_cfg(self, rl_training: bool = False, use_command_goal: bool = False): |
There was a problem hiding this comment.
Is this behavior is general to all RLTasks? That the RL task termination is always (or at least usually) the IL task termination minus the success condition?
Can we push this to a function or a base class? Maybe the simplest is a function remove_success_condition(super.get_termination_cfg()). Actually I think that @peterd-NV did this for sequential tasks - he had to remove the success conditions from subtasks when forming the composite.
| target_x_delta: Target range deltas for x [min_delta, max_delta] relative to initial pose (m). | ||
| target_y_delta: Target range deltas for y [min_delta, max_delta] relative to initial pose (m). | ||
| target_z_delta: Target range deltas for z [min_delta, max_delta] relative to initial pose (m). | ||
| rl_training_mode: If True, disables success termination. Set to False for evaluation. |
There was a problem hiding this comment.
Can we use the underlying (IL) task (LiftObjectTask) for evaluation, rather than passing this flag?
Summary
RSL RL policy evaluation class