Skip to content

Add clearSessionRoom function and tests for stale session room handling#4188

Open
habdelra wants to merge 3 commits intomainfrom
handle-stale-room-errors
Open

Add clearSessionRoom function and tests for stale session room handling#4188
habdelra wants to merge 3 commits intomainfrom
handle-stale-room-errors

Conversation

@habdelra
Copy link
Contributor

Summary

  • Stop repeated realm event broadcast failures caused by stale Matrix session room ids.
  • Detect the specific M_FORBIDDEN "realm server not in room" send error during broadcastRealmEvent.
  • Clear the stored users.session_room_id for that user/room pair so future index broadcasts stop targeting an invalid room.
  • Add regression tests for both the DB helper and the realm-server broadcast cleanup path.
  • Register the new realm-server test in the CI shard list.

Problem

After from-scratch indexing, realm-server could try to broadcast incremental index events into per-user session DM rooms that no longer included @realm_server:localhost. Those stale session_room_id values remained in the database, so every later broadcast retried the same dead room and logged repeated 403 M_FORBIDDEN errors.

Fix

When Matrix rejects a send with the specific "user not in room" error, realm-server now treats that room mapping as stale and removes it from the user record. This keeps the existing design intact, where realm events are delivered through per-user session DMs, while preventing endless retries against rooms the service account has already left or lost access to.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes repeated 403 M_FORBIDDEN errors when realm-server tries to broadcast realm events to stale Matrix session rooms. When the send fails because the realm server is no longer in the room, the stale session_room_id is cleared from the database.

Changes:

  • Added clearSessionRoom DB helper that conditionally nulls out a user's session_room_id when it matches a given room ID.
  • Added error detection in broadcastRealmEvent to catch "not in room" Matrix errors and clear the stale session room.
  • Added tests for both the DB helper and the broadcast cleanup path, registered in CI.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/runtime-common/db-queries/session-room-queries.ts New clearSessionRoom function
packages/realm-server/node-realm.ts Error parsing + stale room cleanup in broadcastRealmEvent
packages/realm-server/tests/session-room-queries-test.ts Tests for clearSessionRoom
packages/realm-server/tests/node-realm-test.ts Tests for broadcast stale room handling
packages/realm-server/tests/index.ts Register new test file
.github/workflows/ci.yaml Add new test to CI shard list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bed476a54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@habdelra habdelra requested a review from a team March 13, 2026 18:39
@github-actions
Copy link

github-actions bot commented Mar 13, 2026

Host Test Results

    1 files      1 suites   2h 29m 19s ⏱️
2 022 tests 2 007 ✅ 15 💤 0 ❌
2 037 runs  2 022 ✅ 15 💤 0 ❌

Results for commit 3caf9b7.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@backspace backspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same host test failed on a PR of mine so it seems spurious or flaky 😞

@habdelra
Copy link
Contributor Author

that test failure should be fixed in main now. if you merge upstream it will pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants