We're running SSM port forwarding sessions (up to 12hrs) that are experiencing seemingly random disconnects.
The SSM port-forwarding sessions are coming from corporate workstations running an internal WinUI3 application that utilizes the session-manager-plugin.exe utility to allow users to RDP to specific EC2 instances in our AWS environment. Sometimes the SSM session disconnects happen after several hours of being connected and sometimes they occur within a few minutes of the session being established. There doesn't seem to be any predictability or reproducibility as to when a disconnect will happen and unfortunately the session-manager-plugin.log file doesn't offer much insight as to what might be causing the disconnections.
The only signs of trouble we can see the session-manager-plugin.log actually record are batches of 'Unexpected sequence message received' messages until it randomly decided to trigger a ResumeSession event that we can see get logged in CloudTrail but after resuming it again starts outputting more "Unexpected sequence message received." entries until it abruptly terminates with no actual output or reason given as to why it suddenly terminated the session.
When we look at the network traffic in Wireshark we can see the following sequence in the moments leading up to the final session termination:
- the client sends a TCP keepalive and gets a normal ACK
- client and server exchange fresh TLS application data
- the server is still actively sending valid data
- then the client host sends RST, ACK
- ssm session is terminated
So since the RST is coming from the client side we suspect there might be a bug in session-manager-plugin.exe that's causing this behavior. Whether or not that's due to all of the 'Unexpected sequence message received' messages it seemingly tolerates for a while we're not sure, but whatever it is isn't getting flagged in Windows Event Viewer or anywhere else we can find.
Any guidance or help you can provide would be greatly appreciated, thanks!
We're running SSM port forwarding sessions (up to 12hrs) that are experiencing seemingly random disconnects.
The SSM port-forwarding sessions are coming from corporate workstations running an internal WinUI3 application that utilizes the session-manager-plugin.exe utility to allow users to RDP to specific EC2 instances in our AWS environment. Sometimes the SSM session disconnects happen after several hours of being connected and sometimes they occur within a few minutes of the session being established. There doesn't seem to be any predictability or reproducibility as to when a disconnect will happen and unfortunately the session-manager-plugin.log file doesn't offer much insight as to what might be causing the disconnections.
The only signs of trouble we can see the session-manager-plugin.log actually record are batches of 'Unexpected sequence message received' messages until it randomly decided to trigger a ResumeSession event that we can see get logged in CloudTrail but after resuming it again starts outputting more "Unexpected sequence message received." entries until it abruptly terminates with no actual output or reason given as to why it suddenly terminated the session.
When we look at the network traffic in Wireshark we can see the following sequence in the moments leading up to the final session termination:
So since the RST is coming from the client side we suspect there might be a bug in session-manager-plugin.exe that's causing this behavior. Whether or not that's due to all of the 'Unexpected sequence message received' messages it seemingly tolerates for a while we're not sure, but whatever it is isn't getting flagged in Windows Event Viewer or anywhere else we can find.
Any guidance or help you can provide would be greatly appreciated, thanks!