Skip to content

Conversation

@DanTup
Copy link
Contributor

@DanTup DanTup commented Dec 15, 2025

@elliette interested in your thoughts here.

The bug at Dart-Code/Dart-Code#5808 is that after sleeping the machine, the DTD connection drops and the sidebar stops working. There's no handling of errors or connection drops, so the sidebar continues to look fine, but any click results in an unhandled error written to the console.

This change handles the done event in the DTD connection manager and automatically tries to reconnect (up to 5 times, with exponential backoff) as long as disconnect/dispose wasn't called explicitly. Without the retries, we might try before the network connection is back up (ERR_NETWORK_IO_SUSPENDED) - I was going to look to see if the network presence APIs are available in Dart, but actually they're not what we need - we don't need to know if you have an internet connection, because we're just connecting to localhost. So a retry seemed like the best option.

With this change, I can resume from sleep and the sidebar automatically reconnects (and re-initializes its state).

Screenshot 2025-12-15 184952

This also seems to fire for the property editor and I'm unsure whether it could replace your previous fix at #9028. You mentioned in that chat that done didn't fire - but I don't know if you just meant it was delayed until the machine woke up or not (in my testing, it is firing, because my code is running).

Unfortunately I don't know of a way we could easily automate a test for this, but if you have ideas, let me know.

Fixes Dart-Code/Dart-Code#5808
Fixes #9531

@DanTup DanTup requested a review from elliette December 15, 2025 19:00
@jakemac53
Copy link
Contributor

Would it be possible to have this attempt to reconnect whenever devtools attempts to use it? That way we can keep trying to reconnect later on even if we failed to reconnect once?

@DanTup DanTup requested a review from a team as a code owner December 15, 2025 19:04
@DanTup
Copy link
Contributor Author

DanTup commented Dec 15, 2025

Would it be possible to have this attempt to reconnect whenever devtools attempts to use it? That way we can keep trying to reconnect later on even if we failed to reconnect once?

With this change, when the connection drops we will emit a null for the current connection, so any UI will revert to showing a disconnected state or spinner (depending on what each page does with a null connection), so I'm not sure there is much opportunity to perform an action to trigger it.

It was already the case that if the DTD connection fails to connect, the UI would just remain in the unconnected state. It would perhaps be nice to show more information if we fail to connect (not just after a dropped connection, but even on the first attempt) and a button to force a retry - although maybe as a separate PR as it might need a bit of refactoring of each page/widget that reads the connection from here.

(ofc, we could also increase the number of retries and cap the delay if we think there's a reason we wouldn't connect in the whole current duration)

@jakemac53
Copy link
Contributor

Yeah if the whole UI already doesn't appear when in this state then I think this change makes sense, but it would be nice to get a reconnect button in the disconnected UI eventually.

@DanTup
Copy link
Contributor Author

DanTup commented Dec 15, 2025

Yeah if the whole UI already doesn't appear when in this state then I think this change makes sense, but it would be nice to get a reconnect button in the disconnected UI eventually.

To be clear, before this change, the UI does stay visible (at least for things like the Sidebar)... but it's broken when you try to click things. While we could try to handle the failure when sending a request to DTD by just reconnecting, the state may not be consistent (because we may have missed events from DTD that would've otherwise updated the state).

So I don't think it's safe to generally just reconnect and try to continue. I think the safest default is to just throw the UI away, go back to the "connecting..." state (which setting connection.value = null does during the done handler), and then connect and initialize (which setting connection.value back to a new connection does). We could add some way for pages that can gracefully handle a reconnection to do so (eg. because they don't hold important state, or can re-sync the data and keep other UI state), but I don't know enough about all the other pages to know which ones can do this.

Probably there are still some improvements we could make in the short-term (in this PR or another):

  1. Show more information for the connecting state (eg. show text saying it's connecting, if it's making multiple attempts, etc.) instead of only the spinner that gets shown today
  2. If the last connection attempt fails, change to show a message, and a button to "try again" (instead of probably just an unhandled exception, leaving the spinner showing forever as it does today)

This will still result in the UI "disappearing" on a connection failure, but it would be clearer to the user what's happening, and in the case that all of the reconnects fail, there would be a button to try again.

@elliette
Copy link
Member

This LGTM from a code perspective but I'm curious about the done event. IIRC while investigating #9028 I was seeing the done event when my computer went to sleep for just a few minutes, but after ~5 minutes the done event would never fire.

Regarding connecting state - I wonder if instead of the spinner we could reuse the reconnection overlay that was added in #9043 (this can be a follow up)

@DanTup
Copy link
Contributor Author

DanTup commented Dec 16, 2025

Regarding connecting state - I wonder if instead of the spinner we could reuse the reconnection overlay that was added in #9043 (this can be a follow up)

Oh, I didn't realise we had this. It's possible that in the case where done fires, this PR will prevent that from being shown (because we'll go back to the spinner).

Let me take another look at this and see if I can just pull that further up and handle it generally then.

@DanTup DanTup marked this pull request as draft December 16, 2025 11:20
@DanTup DanTup force-pushed the auto-reconnect-to-dtd branch from 8daf30d to d1a662e Compare December 16, 2025 15:52
- Repurposes the property editor "ReconnectingOverlay" as a more general "Connecting Overlay" also for the sidebar.
- If a DTD connection drops, it will be retried up to 5 times (and the new connection sent to the ValueNotifier)
- If the connection fails after all retries, a "Retry" button is shown to allow the user to restart connection attempts again
- The periodic check for the connection being alive is now part of DTDManager instead of only the property editor
- DTDManager now also emits state values so consumers can handle each state differently (rather than just connected/not connected)
- Mock editor for Stager apps now has a "Drop Connection" button to similate DTD connection drops
- Stager apps that use the mock editor now use a real DTDManager (since they connect to real DTD) and can simulate slow and failed connections
@DanTup DanTup force-pushed the auto-reconnect-to-dtd branch from d1a662e to 7043255 Compare December 16, 2025 16:06
@DanTup
Copy link
Contributor Author

DanTup commented Dec 16, 2025

Ok, I redid a lot of this so it should be reviewed again as a whole.

I extracted some smaller unrelated parts from this PR into #9589 to try and reduce the size of this, but the DTDManager changes are not trivial.

I made the reconnecting overlay more general and use it as an overlay for any type of connection. It will now show progress like "Connecting...", "Reconnecting in Xs", and "Connection Failed". When the connection has failed all retries, there is a "Retry" button for the user to restart another set of retries.

Changes include:

  • Repurposes the property editor "ReconnectingOverlay" as a more general "Connecting Overlay" also for the sidebar.
  • If a DTD connection drops, it will be retried up to 5 times (and the new connection sent to the ValueNotifier)
  • If the connection fails after all retries, a "Retry" button is shown to allow the user to restart connection attempts again
  • The periodic check for the connection being alive is now part of DTDManager instead of only the property editor
  • DTDManager now also emits state values so consumers can handle each state differently (rather than just connected/not connected)
  • Mock editor for Stager apps now has a "Drop Connection" button to similate DTD connection drops
  • Stager apps that use the mock editor now use a real DTDManager (since they connect to real DTD) and can simulate slow and failed connections
  • Removed the force refresh for reconnect, and instead just made the _DtdConnectedScreen use a key so that when the connection changes, the widget gets new state

@elliette once you're happy with the code, I think it would be useful for you to re-test the connection issue you'd triggered with the property editor because I don't think we've seen exactly the same behaviour (I couldn't reproduce done not firing, for ex). I have tested the PE in both VS Code and the stager app and believe it's working the same as it was before (but the handling is more generic).

@DanTup DanTup marked this pull request as ready for review December 16, 2025 16:12
Copy link
Member

@elliette elliette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but LGTM! Thanks for re-working this. I pulled down the changes and let my computer go to sleep while I went to grab breakfast and when I came back the Property Editor and Flutter sidebar both reconnected as expected.

@DanTup
Copy link
Contributor Author

DanTup commented Dec 16, 2025

@elliette

I pulled down the changes and let my computer go to sleep while I went to grab breakfast and when I came back the Property Editor and Flutter sidebar both reconnected as expected.

Great, thank you!

I've pushed tweaks for all of your feedback, with just one outstanding question about safeUnawaited above.

auto-submit bot pushed a commit that referenced this pull request Dec 16, 2025
Some minor changes extracted from #9587, since that review got quite big and I want to simplify it a bit.

- Exclude nested SDK from being analyzed in VS Code
- Fix path in DTD task (it was correct for inside the `packages` folder but was moved to the root without updating)
- Update release notes + template to not say "general updates" in all sections (when only one of them was general)
@DanTup
Copy link
Contributor Author

DanTup commented Dec 16, 2025

@elliette some odd failures on the bots like this:

FlutterDriveProcess - Unhandled exception:
FlutterDriveProcess - TimeoutException (500): timeout: Timed out receiving message from renderer: 299.773
FlutterDriveProcess -   (Session info: chrome=143.0.7499.40)
FlutterDriveProcess - #0      parseW3cResponse (package:webdriver/src/handler/w3c/utils.dart:87:9)
FlutterDriveProcess - #1      W3cNavigationHandler.parseNavigateToResponse (package:webdriver/src/handler/w3c/navigation.dart:26:5)
FlutterDriveProcess - #2      AsyncRequestClient.send (package:webdriver/src/common/request_client.dart:96:32)

Have you seen anything like this before? I don't think they're related to my change, but I'll do some digging if they haven't been seen elsewhere.

@elliette
Copy link
Member

@elliette some odd failures on the bots like this:

FlutterDriveProcess - Unhandled exception:
FlutterDriveProcess - TimeoutException (500): timeout: Timed out receiving message from renderer: 299.773
FlutterDriveProcess -   (Session info: chrome=143.0.7499.40)
FlutterDriveProcess - #0      parseW3cResponse (package:webdriver/src/handler/w3c/utils.dart:87:9)
FlutterDriveProcess - #1      W3cNavigationHandler.parseNavigateToResponse (package:webdriver/src/handler/w3c/navigation.dart:26:5)
FlutterDriveProcess - #2      AsyncRequestClient.send (package:webdriver/src/common/request_client.dart:96:32)

Have you seen anything like this before? I don't think they're related to my change, but I'll do some digging if they haven't been seen elsewhere.

Hmm I don't recall seeing that before so it's possible it is related to this PR and how we connect to DTD in our integration tests. We can see if they continue to fail on CI or if it was just a flake. The integration tests should all have comments saying how to run them, e.g.:

// dart run integration_test/run_tests.dart --target=integration_test/test/live_connection/app_test.dart

@DanTup
Copy link
Contributor Author

DanTup commented Dec 16, 2025

Got it - they're re-running from my last change so I'll add autosubmit and it they pass must just be a flake. Otherwise, I will do some digging in the morning. Thanks!

Edit: Actually, I'll submit manually because I presume it'll use the top comment as the commit message and it's a bit out of date but I don't want to edit that context out.

@DanTup DanTup added autosubmit Merge PR when tree becomes green via auto submit App and removed autosubmit Merge PR when tree becomes green via auto submit App labels Dec 16, 2025
@DanTup
Copy link
Contributor Author

DanTup commented Dec 17, 2025

I can't reproduce this failure here (search results suggest the error is a failure for Chromedriver to communicate with Chrome), the test runs for me and fails for another reason (although that reason is the same both on stable and on this PR).

I'll create a separate branch and try to strip things down to just this one test and see what happens... I might have to debug it via GH actions.

(one thing that seemed odd to me is that the tests are running on GH with the Web Server device which keeps complaining about using the Dart debug session... however that's also the case before this PR, so I guess it's not related)

@DanTup
Copy link
Contributor Author

DanTup commented Dec 17, 2025

Ok, I also couldn't repro on a branch, so I re-ran the failed bots here and they went green.

So I think they are flakes (or otherwise influenced by something else, like Chrome version or something) and the two runs yesterday were unlucky. From the tests that are failing, I don't think they're related to this change (since this change is partly in the VS Code embedded windows, and partly only affecting if a DTD connection drops, which presumably it is not during the tests), so I'm going to land this - but if we see them again (and think they've increased/started as a result of this), we can revert to investigate further.

@DanTup DanTup merged commit 1af3032 into flutter:master Dec 17, 2025
67 of 70 checks passed
@DanTup DanTup deleted the auto-reconnect-to-dtd branch December 17, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clicking actions in Flutter side bar does not do anything after machine has slept Flutter Sidebar stops working after PC wakes from sleep

3 participants