Skip to content

Migrate to aiomqtt with fixes#4

Open
nanomad wants to merge 15 commits intodevelopfrom
chore/aiomqtt-fixes
Open

Migrate to aiomqtt with fixes#4
nanomad wants to merge 15 commits intodevelopfrom
chore/aiomqtt-fixes

Conversation

@nanomad
Copy link
Copy Markdown
Owner

@nanomad nanomad commented Mar 10, 2026

Summary

Migrates from gmqtt to aiomqtt (async wrapper around paho-mqtt), based on @bj00rn's work in SAIC-iSmart-API#371 with significant bug fixes and improvements on top.

Original work (by @bj00rn)

  • Replace gmqtt with aiomqtt
  • Set retain=True, qos=1 for birth/LWT messages
  • Fix default port mapping for TLS (8883)
  • Fix client reconnection loop

Bug fixes on top

  • Replace run_coroutine_threadsafe with create_task — was silently swallowing all publish/subscribe failures
  • Restore on_mqtt_reconnected callback — HA discovery was never re-pushed after broker reconnect
  • Distinguish fatal vs transient MQTT connection errors — bad credentials now exit instead of retrying forever (supports both MQTT v3.1.1 and v5 reason codes)
  • Prevent connect() from hanging forever — if the run loop exits before connecting, the error now propagates
  • Restore retain=True default — the migration accidentally made all vehicle state publishes non-retained
  • Remove debug "a" suffix from MQTT client identifier
  • Fix string formatting bug in __enable_commands error log ("{e}"%s)
  • Fix connected event ordering — set after __on_connect completes, not before

Improvements

  • Add QoS = Literal[0, 1, 2] type alias for MQTT quality of service level
  • Bump dev dependencies: ruff ^0.15.0, pytest ^9.0.0, pylint ^4.0.0
  • Fix new ruff lint warnings (unnecessary lambdas, PEP 695 type params)
  • Add Python 3.14 to CI test matrix
  • Bump Dockerfile to Python 3.14 and Poetry 2.3.2
  • Merge latest upstream/develop (CI improvements)

Test plan

  • All 102 tests pass
  • mypy: no issues in 88 source files
  • ruff: all checks passed
  • Manual test with MQTT broker (TLS and non-TLS)
  • Verify HA discovery works after broker reconnect
  • Verify retained messages visible to newly connecting clients

🤖 Generated with Claude Code

bj00rn and others added 15 commits March 11, 2026 07:59
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

run_coroutine_threadsafe is meant for cross-thread scheduling but was
being used within the same event loop. The returned Future was discarded,
silently swallowing any publish or subscribe failures. Use create_task
instead with a done callback to surface errors in the logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration to aiomqtt dropped the on_mqtt_reconnected() call, which
resets HA discovery state so it gets re-published on the next update
cycle. Without this, HA entities are never re-discovered after a broker
reconnect. The explicit re-subscribe call is not needed as paho-mqtt
handles resubscription automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The reconnection loop previously caught all MqttError exceptions and
retried every 5 seconds, even for fatal errors like bad credentials or
protocol mismatch. Now MqttConnectError is caught separately and only
"Server unavailable" (the sole transient CONNACK refusal) triggers a
retry. All other connection refusals raise SystemExit, matching the old
gmqtt behavior. Also logs the actual exception in all error paths and
fixes asyncio.CancelledError import path. Supports both MQTT v3.1.1 and
v5 reason codes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
connect() awaited __connected.wait() which would block forever if the
run loop exited before establishing a connection (e.g. missing host,
unexpected exception). Now races the connection event against the task
itself using asyncio.wait. Also upgrades missing MQTT host from a
silent INFO log to a fatal SystemExit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The aiomqtt migration changed the default retain from True to False
across all publish methods. This meant vehicle state data was no longer
retained on the broker, so MQTT clients connecting after a publish
would not see any data until the next refresh cycle. Restore the old
behavior where all publishes are retained by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The error log used a plain string "{e}" instead of a format specifier,
printing the literal text {e} instead of the exception. Also changed
raise e to bare raise to preserve the original traceback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
__connected.set() was called before __on_connect() finished, so callers
of connect() could proceed before subscriptions and keepalive were
established. Move the event set to after __on_connect() completes, and
remove the now-incorrect __connected.is_set() guard from
__enable_commands since it runs before the event is set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace bare int with Literal[0, 1, 2] across the Publisher hierarchy,
matching the Transport type alias pattern already in the PR. This
provides static type checking to prevent invalid QoS values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nanomad nanomad force-pushed the chore/aiomqtt-fixes branch from 6c618f1 to 49d8afd Compare March 11, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants