Skip to content

feat(grpo): add pluggable action format (DSL vs JSON) #36

@abrichr

Description

@abrichr

The GRPO trainer hardcodes DSL format (CLICK/TYPE/WAIT/DONE) in prompt construction, parsing, and formatting. External RL training use cases need JSON format ({"type": "click", "x": 0.461, "y": 0.021}).

Introduce an ActionCodec protocol with encode/decode/build_prompt methods and DSL/JSON implementations.

Affected functions:

  • _build_agent_messages
  • _parse_vlm_output_to_action
  • _format_action_as_text

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions