-
Notifications
You must be signed in to change notification settings - Fork 23
OCTRL-1008 Attempt reconnecting to ecs-dcs gateway at core startup #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCTRL-1008 Attempt reconnecting to ecs-dcs gateway at core startup #734
Conversation
Co-authored-by: knopers8 <14327588+knopers8@users.noreply.github.com>
- Move subscription logic into goroutine to enable continuous retry - Always start reconnection goroutine regardless of initial connection state - Plugin initialization now succeeds even if DCS gateway is unavailable - Add comprehensive test for unavailable gateway scenario - Resolves issue where plugin never attempted reconnection after init failure Co-authored-by: knopers8 <14327588+knopers8@users.noreply.github.com>
|
@copilot remove the proto-generated go files from the PR. They have nothing to do with the aim of the PR. |
Co-authored-by: knopers8 <14327588+knopers8@users.noreply.github.com>
Removed all proto-generated files from the PR. The changes now only include the DCS plugin fix (core/integration/dcs/plugin.go and core/integration/dcs/plugin_reconnect_test.go). Commit: 22d0313 |
knopers8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that actually looks correct and works well (tested on staging)
|
The commits should be squashed when merging. |
Problem
The DCS plugin only attempts to reconnect to the Subscribe stream if it has been already established during the plugin lifetime. However, if the ecs-dcs-gateway is unavailable during plugin initialization, the plugin will never attempt to reconnect, leaving it permanently disconnected.
This issue was observed in production on 14/04/2025 at 13:31 when the DCS gateway was unavailable during AliECS core startup.
Root Cause
In
core/integration/dcs/plugin.go, theInit()method performs an initial subscription attempt (lines 290-293). If this fails, the method returns an error and the reconnection goroutine (lines 294-348) is never started:Solution
This PR restructures the initialization logic to ensure the reconnection goroutine always starts, regardless of initial connection status:
Init()succeeds even when the DCS gateway is unavailableKey Changes
evStream = nilon connection failures)Testing
TestPluginInitWithUnavailableGatewayvalidates the fixImpact
This fix ensures the DCS plugin will reliably reconnect to the ecs-dcs-gateway after it becomes available, preventing the production issue experienced when the gateway is unavailable during startup. The plugin now gracefully handles network outages and service restarts without requiring AliECS core restarts.
Warning
Firewall rules blocked me from connecting to one or more addresses
I tried to connect to the following addresses, but was blocked by firewall rules:
esm.ubuntu.com/usr/lib/apt/methods/https(dns block)If you need me to access, download, or install something from one of these locations, you can either:
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.