Skip to content

fix(docker): keep synchronous I/O off the Docker tab render path#2641

Open
tspader wants to merge 1 commit into
unraid:masterfrom
tspader:fix/docker-tab-perf
Open

fix(docker): keep synchronous I/O off the Docker tab render path#2641
tspader wants to merge 1 commit into
unraid:masterfrom
tspader:fix/docker-tab-perf

Conversation

@tspader
Copy link
Copy Markdown

@tspader tspader commented May 16, 2026

Overview

I got tired of the Docker tab being painfully slow to load, so I dug into what was wrong. I found that an incredible amount of blocking work was being done on the main request thread:

  • An icon lookup that parsed every XML template in /boot for every container (i.e. N * N)
  • Two network calls to tailscale.com to get the DERP map and latest version
  • docker exec calls into every container with Tailscale enabled

This is an astounding amount of work to be doing at the interval the web UI does it; it is an embarrassing amount of work to be done in such a way that it blocks the UI. I ran some sanity checks, and it takes, on my box with 30 containers:

  • ~2 seconds when no containers have Tailscale enabled
  • 7 seconds(!) when even a single one does

In addition to this, I fixed an issue which would append a new <script> tag to the page every three seconds, indefinitely.

Changes

Here's a slightly fuller change set:

  • DockerClient: stop clobbering the cached icon path; anchor unresolved containers to the default icon; memoize driver/port/host per request.
  • DockerContainers: read Tailscale status, DERP map, and latest version from a JSON cache instead of fetching inline.
  • New nchan/tailscale_status worker refreshes that cache on the docker_load pattern (1s timeout per container, 6h for DERP/version, self-exits when no subscribers).
  • DockerContainers.page: eval poll script via new Function() instead of appending a <script> to every poll; defer next poll while the tab is hidden; ping /sub/tailscalestatus to keep the worker alive.

I understand that this is all legacy code, and I understand that a few PRs have been politely declined in favor of an impending rewrite. But this PR is a very small patch which just copies a few existing patterns, more or less verbatim, to fix what I consider unacceptable in both my user experience and level of quality in something I paid for. Again, this is no shade on any of the developers, I understand the nature of code like this, which is why I spent an afternoon trying to make Unraid better rather than complain.

I'm happy to follow up with anything you need from me. Thanks for your time,
Spader

Summary by CodeRabbit

  • New Features

    • Added continuous Tailscale status monitoring and caching for Docker containers.
  • Performance

    • Optimized Docker container list refreshes based on browser tab visibility, reducing unnecessary updates.
    • Improved Tailscale metadata retrieval efficiency through intelligent caching mechanisms.

Review Change Stack

Each poll of the Docker tab was doing several seconds of synchronous
work on the request thread
- re-parsing every template XML on /boot for icon lookups
- Two HTTPS calls to tailscale.com
- `docker exec` into each Tailscale-enabled container.

On a 30-container box, this took ~2 seconds without Tailscale-enabled
containers and ~7 seconds with.

Fixes:
- DockerClient: stop clobbering the cached icon path; anchor unresolved
containers to the default icon; memoize driver/port/host per request.
- DockerContainers: read Tailscale status, DERP map, and latest version
from a JSON cache instead of fetching inline.
- New nchan/tailscale_status worker refreshes that cache on the
docker_load pattern (1s timeout per container, 6h for DERP/version,
self-exits when no subscribers).
- DockerContainers.page: eval poll script via new Function() instead of
appending a <script> to <head> every poll; defer next poll while the tab
is hidden; ping /sub/tailscalestatus to keep the worker alive.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Walkthrough

This PR refactors Tailscale status polling from synchronous docker exec calls into asynchronous background polling with disk-based caching. A new nchan service continuously fetches Tailscale metadata and per-container status, the backend switches to reading cached results, and the frontend defers list refreshes when the browser tab is hidden to reduce unnecessary polling.

Changes

Tailscale Polling and Caching

Layer / File(s) Summary
Tailscale Cache Helper Methods
emhttp/plugins/dynamix.docker.manager/include/DockerClient.php
DockerUtil adds TAILSCALE_CACHE_FILE constant and four static helper methods to read Tailscale metadata (status, DERP, version) from a persistent JSON cache file.
Background Tailscale Polling Service
emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status
New daemon continuously polls Tailscale DERP and version endpoints, discovers containers with tailscale labels, executes docker exec with timeout to fetch per-container status (with stale cache fallback), and writes aggregated state atomically while publishing lightweight cache-update notifications to Nchan subscribers.
Backend Refactoring to Use Cache
emhttp/plugins/dynamix.docker.manager/include/DockerClient.php, emhttp/plugins/dynamix.docker.manager/include/DockerContainers.php
DockerClient::getTailscaleJson() delegates to cached DockerUtil::tailscaleStatus(), getAllInfo() validates icon paths against both cache and docroot with guaranteed fallback, DockerContainers::tailscale_stats() returns cached status instead of live docker exec, and DockerUtil::driver(), port(), host() memoize results in static variables to reduce repeated docker calls.
Frontend Tab-Aware Refresh and Polling
emhttp/plugins/dynamix.docker.manager/DockerContainers.page
Introduces scheduleNextLoad() which pauses refresh scheduling when the browser tab is hidden and resumes on visibility change; server payloads are executed directly via new Function() instead of DOM injection; refresh triggers use the new deferral function; a tailscaleStatus Nchan subscriber is initialized to keep background polling responsive.

Sequence Diagram(s)

sequenceDiagram
  participant Background as Polling Service
  participant Tailscale as Tailscale API
  participant Docker as Docker
  participant Cache as Cache File
  participant Browser as Frontend
  
  Background->>Tailscale: Fetch DERP and version metadata
  Tailscale-->>Background: JSON response
  Background->>Docker: docker ps (find labeled containers)
  Docker-->>Background: Container list
  Background->>Docker: docker exec tailscale status
  Docker-->>Background: JSON status (or timeout → use cache)
  Background->>Cache: Atomic write aggregated state
  Background->>Browser: Publish {ts} to Nchan
  Browser->>Browser: Check document.hidden before refresh
  Browser->>Cache: Read latest Tailscale data
  Cache-->>Browser: Return cached status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Beneath the tabs where browsers hide,

A quiet daemon runs with pride—

Polling Tailscale, caching tight,

Then waking up when tabs see light! ✨

No more docker calls in the fray,

Just cache reads lighting up the day.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: deferring I/O operations off the render path by implementing caching and background workers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status (2)

79-84: 💤 Low value

Silent failure if atomic rename fails.

If the filesystem is full or permissions change, @rename() will fail silently and the cache won't update. Consider logging or returning a status:

 function ts_save_state(array $state): void {
   $tmp = TS_CACHE_FILE . '.tmp';
   if (`@file_put_contents`($tmp, json_encode($state, JSON_UNESCAPED_SLASHES)) !== false) {
-    `@rename`($tmp, TS_CACHE_FILE);
+    if (!`@rename`($tmp, TS_CACHE_FILE)) {
+      `@unlink`($tmp); // Clean up orphan temp file
+    }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status` around lines 79
- 84, ts_save_state currently silences errors for rename and thus can silently
fail to update TS_CACHE_FILE; modify ts_save_state to check the return value of
rename($tmp, TS_CACHE_FILE) and handle failures by logging the error (include
context like TS_CACHE_FILE and $tmp and last PHP error via error_get_last())
and/or returning a boolean status; also remove the error suppression operator on
rename (and optionally on file_put_contents) so failures surface to your logger,
and ensure callers of ts_save_state handle the returned success/failure.

104-111: 💤 Low value

Hardcoded 60-second staleness threshold should be a named constant.

The value 60 on line 108 is a staleness threshold for cached container status but is hardcoded while other timing values use constants. For consistency and maintainability:

 const TS_EXEC_TIMEOUT_S      = 1;
+const TS_STALE_THRESHOLD_S   = 60;
 
 ...
 
-    } elseif (isset($state['containers'][$name]) && ($now - $state['containers'][$name]['ts']) < 60) {
+    } elseif (isset($state['containers'][$name]) && ($now - $state['containers'][$name]['ts']) < TS_STALE_THRESHOLD_S) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status` around lines
104 - 111, Replace the hardcoded 60-second staleness check with a named constant
(e.g., CONTAINER_STATUS_STALE_SECONDS) and use that constant in the conditional
inside the foreach where ts_container_status($name) is checked; define the
constant near the other timing constants (or at the top of this file) so the
conditional becomes ($now - $state['containers'][$name]['ts']) <
CONTAINER_STATUS_STALE_SECONDS, and update any related comments to reference the
new constant for consistency with existing timing values.
emhttp/plugins/dynamix.docker.manager/DockerContainers.page (1)

157-157: ⚡ Quick win

Consider wrapping new Function() execution in try-catch.

If the server returns malformed JavaScript in data[1], the unhandled exception will break subsequent UI updates. While the server is trusted, defensive coding prevents silent failures:

-    (new Function(data[1]))();
+    try { (new Function(data[1]))(); } catch(e) { console.error('Docker script error:', e); }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@emhttp/plugins/dynamix.docker.manager/DockerContainers.page` at line 157, The
call to new Function(data[1])() can throw if the server returns malformed JS;
wrap that invocation in a try-catch around new Function(data[1])() (in the same
scope where data is used) to catch any exception, log or display the error (so
it’s visible for debugging), and ensure the code continues (optionally skip
executing the returned function or use a safe fallback) so subsequent UI updates
are not interrupted.
emhttp/plugins/dynamix.docker.manager/include/DockerClient.php (1)

340-347: 💤 Low value

Icon existence check may miss web-root-relative paths stored without leading slash.

The check is_file($tmp['icon']) works for absolute paths, and is_file($docroot . $tmp['icon']) works when $tmp['icon'] starts with /. However, if a cached icon path is stored as a relative path without a leading slash (e.g., plugins/...), neither check will match and the icon will be unnecessarily re-fetched.

Consider normalizing the path before checking:

-$iconExists = !empty($tmp['icon'])
-  && (is_file($tmp['icon']) || is_file($docroot . $tmp['icon']));
+$iconPath = $tmp['icon'] ?? '';
+$iconExists = $iconPath !== ''
+  && (is_file($iconPath) || is_file($docroot . '/' . ltrim($iconPath, '/')));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@emhttp/plugins/dynamix.docker.manager/include/DockerClient.php` around lines
340 - 347, The existing icon existence check can miss cached paths like
"plugins/..." because is_file($docroot . $tmp['icon']) assumes a leading slash;
normalize before testing by trimming any leading slashes and testing both the
raw path and the docroot-joined path with a slash: use something like $iconPath
= $tmp['icon']; $iconPathNoSlash = ltrim($iconPath, '/'); then check
is_file($iconPath) || is_file($docroot . '/' . $iconPathNoSlash); keep the rest
of the logic (including calling $this->getIcon($image, $name, $labelIconUrl ?:
($tmp['icon'] ?? '')) and the fallback assignment) unchanged so relative cached
paths are recognized and not re-fetched unnecessarily.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@emhttp/plugins/dynamix.docker.manager/DockerContainers.page`:
- Line 157: The call to new Function(data[1])() can throw if the server returns
malformed JS; wrap that invocation in a try-catch around new Function(data[1])()
(in the same scope where data is used) to catch any exception, log or display
the error (so it’s visible for debugging), and ensure the code continues
(optionally skip executing the returned function or use a safe fallback) so
subsequent UI updates are not interrupted.

In `@emhttp/plugins/dynamix.docker.manager/include/DockerClient.php`:
- Around line 340-347: The existing icon existence check can miss cached paths
like "plugins/..." because is_file($docroot . $tmp['icon']) assumes a leading
slash; normalize before testing by trimming any leading slashes and testing both
the raw path and the docroot-joined path with a slash: use something like
$iconPath = $tmp['icon']; $iconPathNoSlash = ltrim($iconPath, '/'); then check
is_file($iconPath) || is_file($docroot . '/' . $iconPathNoSlash); keep the rest
of the logic (including calling $this->getIcon($image, $name, $labelIconUrl ?:
($tmp['icon'] ?? '')) and the fallback assignment) unchanged so relative cached
paths are recognized and not re-fetched unnecessarily.

In `@emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status`:
- Around line 79-84: ts_save_state currently silences errors for rename and thus
can silently fail to update TS_CACHE_FILE; modify ts_save_state to check the
return value of rename($tmp, TS_CACHE_FILE) and handle failures by logging the
error (include context like TS_CACHE_FILE and $tmp and last PHP error via
error_get_last()) and/or returning a boolean status; also remove the error
suppression operator on rename (and optionally on file_put_contents) so failures
surface to your logger, and ensure callers of ts_save_state handle the returned
success/failure.
- Around line 104-111: Replace the hardcoded 60-second staleness check with a
named constant (e.g., CONTAINER_STATUS_STALE_SECONDS) and use that constant in
the conditional inside the foreach where ts_container_status($name) is checked;
define the constant near the other timing constants (or at the top of this file)
so the conditional becomes ($now - $state['containers'][$name]['ts']) <
CONTAINER_STATUS_STALE_SECONDS, and update any related comments to reference the
new constant for consistency with existing timing values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 75f63953-20ac-45a8-920c-17b4e7aa20d9

📥 Commits

Reviewing files that changed from the base of the PR and between b78d577 and 3fb72a2.

📒 Files selected for processing (4)
  • emhttp/plugins/dynamix.docker.manager/DockerContainers.page
  • emhttp/plugins/dynamix.docker.manager/include/DockerClient.php
  • emhttp/plugins/dynamix.docker.manager/include/DockerContainers.php
  • emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3fb72a2dfb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}
});
// We don't need the event from the subscription. It's just to keep the worker running.
var tailscaleStatus = new NchanSubscriber('/sub/tailscalestatus',{subscriber:'websocket'});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add tailscale_status to Docker page Nchan scripts

This new subscription assumes a publisher is already running, but DockerContainers.page still declares only Nchan="docker_load" in its header, so DefaultPageLayout never starts nchan/tailscale_status. On a normal page load there is therefore no process publishing tailscalestatus, the cache file is never refreshed, and Tailscale status/tooltips regress to stale or empty data for running containers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

@elibosley elibosley May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tspader - this seems like a legitimate finding to me, can you verify and look into adding this to the dockercontainers.page

-Nchan="docker_load"
+Nchan="docker_load,tailscale_status"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for dynamix to be able to stop the scripts on array stop etc.

@SpencerGraffunder
Copy link
Copy Markdown

If this really fixes the super slow web UI I've been experiencing, it might get me to pay for another year of updates.

@elibosley elibosley self-requested a review May 18, 2026 21:11
@elibosley elibosley added the 7.3 label May 18, 2026
@github-actions
Copy link
Copy Markdown

🔧 PR Test Plugin Available

A test plugin has been generated for this PR that includes the modified files.

Version: 2026.05.18.2112
Build: View Workflow Run

📥 Installation Instructions:

Install via Unraid Web UI:

  1. Go to Plugins → Install Plugin
  2. Copy and paste this URL:
https://preview.dl.unraid.net/pr-plugins/pr-2641/webgui-pr-2641.plg
  1. Click Install

Alternative: Direct Download

⚠️ Important Notes:

  • Testing only: This plugin is for testing PR changes
  • Backup included: Original files are automatically backed up
  • Easy removal: Files are restored when plugin is removed
  • Conflicts: Remove this plugin before installing production updates
  • Post-merge behavior: This preview stays available after merge until preview storage expires or it is manually cleaned up

📝 Modified Files:

Click to expand file list
emhttp/plugins/dynamix.docker.manager/DockerContainers.page
emhttp/plugins/dynamix.docker.manager/include/DockerClient.php
emhttp/plugins/dynamix.docker.manager/include/DockerContainers.php
emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status

🔄 To Remove:

Navigate to Plugins → Installed Plugins and remove webgui-pr-2641, or run:

plugin remove webgui-pr-2641

🤖 This comment is automatically generated and will be updated with each new push to this PR.

@elibosley
Copy link
Copy Markdown
Member

elibosley commented May 18, 2026

Thank you for this PR! We really appreciate you taking the time to work through these performance issues, and I totally understand your sentiment about the Docker pages being very painful to work on (as well as slow 😢 )

We're making efforts to fix this with internal development as well, and I'm happy to give your PR a review and test and we can likely get this merged into 7.3.1 if all goes well.

@ljm42
Copy link
Copy Markdown
Member

ljm42 commented May 18, 2026

I tested this PR on my production Unraid 7.3.0 server, including the Nchan="docker_load,tailscale_status" change in DockerContainers.page mentioned above.

The results are quite good. On my system, with multiple containers using Unraid’s Tailscale integration, the Docker page render endpoint went from about 1.7s down to about 0.3s. Wow (!!)

With the additional Nchan entry in place, the tailscale_status worker starts, /var/lib/docker/unraid-tailscale-status.json is created, and the Tailscale tooltips work. I also checked /var/log/phplog and /var/log/syslog and did not see any PR-related PHP/WebGUI errors.

One note: the current client-side subscriber keeps/starts the worker, but it does not update the tooltip DOM directly, so to see the latest tooltip data you have to refresh the page. That is similar to how it worked before, so it is not a regression.

I think this is an excellent change for 7.3

@Squidly271
Copy link
Copy Markdown
Contributor

One thing I personally don't like (but it's not a show stopper for me) is that the cache lives within the docker image. Myself, I'd prefer that a reboot of the system would clear out the cache (along with an array stop / start or docker service stop/start), instead of the next 6 hour interval from when the cache was created.

Other than that (and adding it to nchan= ), it's a great improvement in performance.

One note: It actually will up to 6 hours for the script to be terminated if there's no listeners. Not an issue since its sleeping, but termination happens only on a failed publish, so there's no wasted cycles.

// to get live tooltip updates, but that's overkill. A notification is all
// that we really want.
publish('tailscalestatus', json_encode(['ts' => $now]), 1, true);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For save and publish maybe worth only doing if it has changed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification - we typically save an MD5 of the payload and then compare before publishing again, to preserve publish cycles. This is a simple optimization that may help out here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publishing has to be done @SimonFair @elibosley @tspader

See my full comment #2641 (comment)

Copy link
Copy Markdown
Member

@elibosley elibosley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Please check out the comments on this, once addressed, we can get this into the codebase.

$state['containers'] = $fresh;
$state['ts'] = $now;

ts_save_state($state);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this state hasn't changed, probably not worth saving, you can do the MD5 comparison and share the logic with the publish call as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saving it at all, just store it in a static var. Also has the advantage of not having anything cached that survives an array stop / start or reboot

@Squidly271
Copy link
Copy Markdown
Contributor

publish_noDupe() handles not publishing if the payload hasn't changed, but will publish every time $abortTimeout is exceeded between publishing to allow exiting if there are no listeners. (IE: with the 6 hour window and the 30 second abort timeout, the publish will always wind up happening.

publish has to be done one way or another - due to the extremely long time between publishing, and the rarity of changes in the payload, if the publish doesn't happen then the script won't exit if there are no listeners.

And additionally Nchan="docker_load,tailscale_status" not only starts up the script if its not running, but registers it with dynamix so that on array stop (or nginx reload etc) it will automatically kill the script off (the killing off of the script if there are no listeners also unregisters it with dynamix)

@elibosley elibosley removed the 7.3 label May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants