fix(docker): keep synchronous I/O off the Docker tab render path#2641
fix(docker): keep synchronous I/O off the Docker tab render path#2641tspader wants to merge 1 commit into
Conversation
Each poll of the Docker tab was doing several seconds of synchronous work on the request thread - re-parsing every template XML on /boot for icon lookups - Two HTTPS calls to tailscale.com - `docker exec` into each Tailscale-enabled container. On a 30-container box, this took ~2 seconds without Tailscale-enabled containers and ~7 seconds with. Fixes: - DockerClient: stop clobbering the cached icon path; anchor unresolved containers to the default icon; memoize driver/port/host per request. - DockerContainers: read Tailscale status, DERP map, and latest version from a JSON cache instead of fetching inline. - New nchan/tailscale_status worker refreshes that cache on the docker_load pattern (1s timeout per container, 6h for DERP/version, self-exits when no subscribers). - DockerContainers.page: eval poll script via new Function() instead of appending a <script> to <head> every poll; defer next poll while the tab is hidden; ping /sub/tailscalestatus to keep the worker alive.
WalkthroughThis PR refactors Tailscale status polling from synchronous docker exec calls into asynchronous background polling with disk-based caching. A new nchan service continuously fetches Tailscale metadata and per-container status, the backend switches to reading cached results, and the frontend defers list refreshes when the browser tab is hidden to reduce unnecessary polling. ChangesTailscale Polling and Caching
Sequence Diagram(s)sequenceDiagram
participant Background as Polling Service
participant Tailscale as Tailscale API
participant Docker as Docker
participant Cache as Cache File
participant Browser as Frontend
Background->>Tailscale: Fetch DERP and version metadata
Tailscale-->>Background: JSON response
Background->>Docker: docker ps (find labeled containers)
Docker-->>Background: Container list
Background->>Docker: docker exec tailscale status
Docker-->>Background: JSON status (or timeout → use cache)
Background->>Cache: Atomic write aggregated state
Background->>Browser: Publish {ts} to Nchan
Browser->>Browser: Check document.hidden before refresh
Browser->>Cache: Read latest Tailscale data
Cache-->>Browser: Return cached status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status (2)
79-84: 💤 Low valueSilent failure if atomic rename fails.
If the filesystem is full or permissions change,
@rename()will fail silently and the cache won't update. Consider logging or returning a status:function ts_save_state(array $state): void { $tmp = TS_CACHE_FILE . '.tmp'; if (`@file_put_contents`($tmp, json_encode($state, JSON_UNESCAPED_SLASHES)) !== false) { - `@rename`($tmp, TS_CACHE_FILE); + if (!`@rename`($tmp, TS_CACHE_FILE)) { + `@unlink`($tmp); // Clean up orphan temp file + } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status` around lines 79 - 84, ts_save_state currently silences errors for rename and thus can silently fail to update TS_CACHE_FILE; modify ts_save_state to check the return value of rename($tmp, TS_CACHE_FILE) and handle failures by logging the error (include context like TS_CACHE_FILE and $tmp and last PHP error via error_get_last()) and/or returning a boolean status; also remove the error suppression operator on rename (and optionally on file_put_contents) so failures surface to your logger, and ensure callers of ts_save_state handle the returned success/failure.
104-111: 💤 Low valueHardcoded 60-second staleness threshold should be a named constant.
The value
60on line 108 is a staleness threshold for cached container status but is hardcoded while other timing values use constants. For consistency and maintainability:const TS_EXEC_TIMEOUT_S = 1; +const TS_STALE_THRESHOLD_S = 60; ... - } elseif (isset($state['containers'][$name]) && ($now - $state['containers'][$name]['ts']) < 60) { + } elseif (isset($state['containers'][$name]) && ($now - $state['containers'][$name]['ts']) < TS_STALE_THRESHOLD_S) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status` around lines 104 - 111, Replace the hardcoded 60-second staleness check with a named constant (e.g., CONTAINER_STATUS_STALE_SECONDS) and use that constant in the conditional inside the foreach where ts_container_status($name) is checked; define the constant near the other timing constants (or at the top of this file) so the conditional becomes ($now - $state['containers'][$name]['ts']) < CONTAINER_STATUS_STALE_SECONDS, and update any related comments to reference the new constant for consistency with existing timing values.emhttp/plugins/dynamix.docker.manager/DockerContainers.page (1)
157-157: ⚡ Quick winConsider wrapping
new Function()execution in try-catch.If the server returns malformed JavaScript in
data[1], the unhandled exception will break subsequent UI updates. While the server is trusted, defensive coding prevents silent failures:- (new Function(data[1]))(); + try { (new Function(data[1]))(); } catch(e) { console.error('Docker script error:', e); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@emhttp/plugins/dynamix.docker.manager/DockerContainers.page` at line 157, The call to new Function(data[1])() can throw if the server returns malformed JS; wrap that invocation in a try-catch around new Function(data[1])() (in the same scope where data is used) to catch any exception, log or display the error (so it’s visible for debugging), and ensure the code continues (optionally skip executing the returned function or use a safe fallback) so subsequent UI updates are not interrupted.emhttp/plugins/dynamix.docker.manager/include/DockerClient.php (1)
340-347: 💤 Low valueIcon existence check may miss web-root-relative paths stored without leading slash.
The check
is_file($tmp['icon'])works for absolute paths, andis_file($docroot . $tmp['icon'])works when$tmp['icon']starts with/. However, if a cached icon path is stored as a relative path without a leading slash (e.g.,plugins/...), neither check will match and the icon will be unnecessarily re-fetched.Consider normalizing the path before checking:
-$iconExists = !empty($tmp['icon']) - && (is_file($tmp['icon']) || is_file($docroot . $tmp['icon'])); +$iconPath = $tmp['icon'] ?? ''; +$iconExists = $iconPath !== '' + && (is_file($iconPath) || is_file($docroot . '/' . ltrim($iconPath, '/')));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@emhttp/plugins/dynamix.docker.manager/include/DockerClient.php` around lines 340 - 347, The existing icon existence check can miss cached paths like "plugins/..." because is_file($docroot . $tmp['icon']) assumes a leading slash; normalize before testing by trimming any leading slashes and testing both the raw path and the docroot-joined path with a slash: use something like $iconPath = $tmp['icon']; $iconPathNoSlash = ltrim($iconPath, '/'); then check is_file($iconPath) || is_file($docroot . '/' . $iconPathNoSlash); keep the rest of the logic (including calling $this->getIcon($image, $name, $labelIconUrl ?: ($tmp['icon'] ?? '')) and the fallback assignment) unchanged so relative cached paths are recognized and not re-fetched unnecessarily.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@emhttp/plugins/dynamix.docker.manager/DockerContainers.page`:
- Line 157: The call to new Function(data[1])() can throw if the server returns
malformed JS; wrap that invocation in a try-catch around new Function(data[1])()
(in the same scope where data is used) to catch any exception, log or display
the error (so it’s visible for debugging), and ensure the code continues
(optionally skip executing the returned function or use a safe fallback) so
subsequent UI updates are not interrupted.
In `@emhttp/plugins/dynamix.docker.manager/include/DockerClient.php`:
- Around line 340-347: The existing icon existence check can miss cached paths
like "plugins/..." because is_file($docroot . $tmp['icon']) assumes a leading
slash; normalize before testing by trimming any leading slashes and testing both
the raw path and the docroot-joined path with a slash: use something like
$iconPath = $tmp['icon']; $iconPathNoSlash = ltrim($iconPath, '/'); then check
is_file($iconPath) || is_file($docroot . '/' . $iconPathNoSlash); keep the rest
of the logic (including calling $this->getIcon($image, $name, $labelIconUrl ?:
($tmp['icon'] ?? '')) and the fallback assignment) unchanged so relative cached
paths are recognized and not re-fetched unnecessarily.
In `@emhttp/plugins/dynamix.docker.manager/nchan/tailscale_status`:
- Around line 79-84: ts_save_state currently silences errors for rename and thus
can silently fail to update TS_CACHE_FILE; modify ts_save_state to check the
return value of rename($tmp, TS_CACHE_FILE) and handle failures by logging the
error (include context like TS_CACHE_FILE and $tmp and last PHP error via
error_get_last()) and/or returning a boolean status; also remove the error
suppression operator on rename (and optionally on file_put_contents) so failures
surface to your logger, and ensure callers of ts_save_state handle the returned
success/failure.
- Around line 104-111: Replace the hardcoded 60-second staleness check with a
named constant (e.g., CONTAINER_STATUS_STALE_SECONDS) and use that constant in
the conditional inside the foreach where ts_container_status($name) is checked;
define the constant near the other timing constants (or at the top of this file)
so the conditional becomes ($now - $state['containers'][$name]['ts']) <
CONTAINER_STATUS_STALE_SECONDS, and update any related comments to reference the
new constant for consistency with existing timing values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 75f63953-20ac-45a8-920c-17b4e7aa20d9
📒 Files selected for processing (4)
emhttp/plugins/dynamix.docker.manager/DockerContainers.pageemhttp/plugins/dynamix.docker.manager/include/DockerClient.phpemhttp/plugins/dynamix.docker.manager/include/DockerContainers.phpemhttp/plugins/dynamix.docker.manager/nchan/tailscale_status
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fb72a2dfb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| }); | ||
| // We don't need the event from the subscription. It's just to keep the worker running. | ||
| var tailscaleStatus = new NchanSubscriber('/sub/tailscalestatus',{subscriber:'websocket'}); |
There was a problem hiding this comment.
Add tailscale_status to Docker page Nchan scripts
This new subscription assumes a publisher is already running, but DockerContainers.page still declares only Nchan="docker_load" in its header, so DefaultPageLayout never starts nchan/tailscale_status. On a normal page load there is therefore no process publishing tailscalestatus, the cache file is never refreshed, and Tailscale status/tooltips regress to stale or empty data for running containers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@tspader - this seems like a legitimate finding to me, can you verify and look into adding this to the dockercontainers.page
-Nchan="docker_load"
+Nchan="docker_load,tailscale_status"
There was a problem hiding this comment.
This is required for dynamix to be able to stop the scripts on array stop etc.
|
If this really fixes the super slow web UI I've been experiencing, it might get me to pay for another year of updates. |
🔧 PR Test Plugin AvailableA test plugin has been generated for this PR that includes the modified files. Version: 📥 Installation Instructions:Install via Unraid Web UI:
Alternative: Direct Download
|
|
Thank you for this PR! We really appreciate you taking the time to work through these performance issues, and I totally understand your sentiment about the Docker pages being very painful to work on (as well as slow 😢 ) We're making efforts to fix this with internal development as well, and I'm happy to give your PR a review and test and we can likely get this merged into 7.3.1 if all goes well. |
|
I tested this PR on my production Unraid 7.3.0 server, including the The results are quite good. On my system, with multiple containers using Unraid’s Tailscale integration, the Docker page render endpoint went from about 1.7s down to about 0.3s. Wow (!!) With the additional Nchan entry in place, the One note: the current client-side subscriber keeps/starts the worker, but it does not update the tooltip DOM directly, so to see the latest tooltip data you have to refresh the page. That is similar to how it worked before, so it is not a regression. I think this is an excellent change for 7.3 |
|
One thing I personally don't like (but it's not a show stopper for me) is that the cache lives within the docker image. Myself, I'd prefer that a reboot of the system would clear out the cache (along with an array stop / start or docker service stop/start), instead of the next 6 hour interval from when the cache was created. Other than that (and adding it to nchan= ), it's a great improvement in performance. One note: It actually will up to 6 hours for the script to be terminated if there's no listeners. Not an issue since its sleeping, but termination happens only on a failed publish, so there's no wasted cycles. |
| // to get live tooltip updates, but that's overkill. A notification is all | ||
| // that we really want. | ||
| publish('tailscalestatus', json_encode(['ts' => $now]), 1, true); | ||
|
|
There was a problem hiding this comment.
For save and publish maybe worth only doing if it has changed.
There was a problem hiding this comment.
For clarification - we typically save an MD5 of the payload and then compare before publishing again, to preserve publish cycles. This is a simple optimization that may help out here.
There was a problem hiding this comment.
Publishing has to be done @SimonFair @elibosley @tspader
See my full comment #2641 (comment)
elibosley
left a comment
There was a problem hiding this comment.
Hey! Please check out the comments on this, once addressed, we can get this into the codebase.
| $state['containers'] = $fresh; | ||
| $state['ts'] = $now; | ||
|
|
||
| ts_save_state($state); |
There was a problem hiding this comment.
If this state hasn't changed, probably not worth saving, you can do the MD5 comparison and share the logic with the publish call as well.
There was a problem hiding this comment.
Instead of saving it at all, just store it in a static var. Also has the advantage of not having anything cached that survives an array stop / start or reboot
|
publish has to be done one way or another - due to the extremely long time between publishing, and the rarity of changes in the payload, if the publish doesn't happen then the script won't exit if there are no listeners. And additionally |
Overview
I got tired of the Docker tab being painfully slow to load, so I dug into what was wrong. I found that an incredible amount of blocking work was being done on the main request thread:
tailscale.comto get the DERP map and latest versiondocker execcalls into every container with Tailscale enabledThis is an astounding amount of work to be doing at the interval the web UI does it; it is an embarrassing amount of work to be done in such a way that it blocks the UI. I ran some sanity checks, and it takes, on my box with 30 containers:
In addition to this, I fixed an issue which would append a new
<script>tag to the page every three seconds, indefinitely.Changes
Here's a slightly fuller change set:
I understand that this is all legacy code, and I understand that a few PRs have been politely declined in favor of an impending rewrite. But this PR is a very small patch which just copies a few existing patterns, more or less verbatim, to fix what I consider unacceptable in both my user experience and level of quality in something I paid for. Again, this is no shade on any of the developers, I understand the nature of code like this, which is why I spent an afternoon trying to make Unraid better rather than complain.
I'm happy to follow up with anything you need from me. Thanks for your time,
Spader
Summary by CodeRabbit
New Features
Performance