Skip to content

Robot Manager from 5/5 Hackathon#114

Open
bkearns wants to merge 2 commits into
temporal-community:mainfrom
ferrosadb:main
Open

Robot Manager from 5/5 Hackathon#114
bkearns wants to merge 2 commits into
temporal-community:mainfrom
ferrosadb:main

Conversation

@bkearns
Copy link
Copy Markdown

@bkearns bkearns commented May 5, 2026

temporal-hack — Temporal as the control plane for a robotics fleet

What it is. A self-contained lab that uses Temporal to drive two real robotics control loops on top of ROS 2 +
Gazebo, with an MQTT-bridged Go agent on the robot side. Everything runs locally via make sim-up / make agent-up /
make workers-up. https://github.com/ferrosadb/temporal-hack

Why Temporal. Robot fleets need the same things any distributed orchestrator needs — durable retries, timeouts,
cancellation, audit trail — but the actuators are physical. We wanted to see how far Temporal's "workflows are just
code" model could go when the side effects are a moving rover, not a row in a database.

Two workflows ship today:

  1. OTA rollout (ota-worker) — POST /v1/ota/rollouts kicks off a workflow that pushes a new robot-app container image,
    the on-robot Go agent pulls/swaps it under docker/podman, and the rover's behaviour flips live (e.g. drive-circle →
    drive-figure-eight). The rollout workflow shows up at localhost:14080, completes in 1–2 s, and is fully replayable.
  2. Collision response (collision-worker) — a ROS 2 collision topic is bridged to MQTT, MQTT to Temporal. When a
    collision signal lands, a CollisionResponse workflow runs back-up → 90° turn-right → forward as discrete activities.
    Each manoeuvre is its own activity with its own retry/timeout policy; the workflow is the recovery policy expressed
    as code.

Architecture in one diagram (full version in the repo README):

Gazebo ─DDS─▶ ROS 2 bridge ─gRPC─▶ Go agent ─MQTT─▶ Temporal workers
(ota, collision)

HTTP API ◀──────┘

What we think is interesting for the community:

  • Temporal sits outside the real-time loop (ROS 2 still owns the 50 Hz cmd_vel) but inside every loop where humans
    care about durability — rollouts, recovery, audit. That split feels right and we'd love feedback on it.
  • MQTT-as-signal-source: collision events arrive at Temporal via an MQTT → signal bridge, which kept the robot-side
    stack dumb and the cloud side declarative.
  • The OTA executor is a Temporal activity that shells out to podman/docker, with the rollout workflow owning
    idempotency and rollback. runRollback always emits PHASE_ROLLED_BACK so the workflow history is the source of truth
    for what the fleet did.
  • Whole thing runs on a laptop. No real robots required to reproduce.

What we'd like from the community:

  • Patterns for modelling actuator workflows where the activity outcome is "the world changed" rather than "an RPC
    returned 200".
  • Opinions on whether collision-response belongs in a workflow at all, or whether it should be a long-running entity
    workflow per robot.
  • Anyone running Temporal against fleets >1 — how are you sharding workers per robot vs per fleet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant