-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Description
thank you authors for the incredible work.
I've been reading your paper and the OpenTrack code. I have a simple question:
in the paper, you mentioned that the pd targets is reference motion of next frame + action_scale * tanh(policy output)
but in the code, i see the current implementation is:
- actions_scale is uniform rather than empirically designed hyperparameters
- tanh is not applied to the policy output
in MLP forward function in brax2torch.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
for i, layer in enumerate(self.hidden):
x = layer(x)
if i != len(self.hidden) - 1 or self.activate_final:
x = self.act(x)
if self.split:
loc, _ = torch.chunk(x, 2, dim=-1)
return torch.tanh(loc)
return x
and the self.split is always False in current code base.
please help to clarify this inconsistency. or just using uniform alpha and linear action it is still fine in your test
Metadata
Metadata
Assignees
Labels
No labels