Skip to content

Conversation

@ReimarBauer
Copy link
Member

@ReimarBauer ReimarBauer commented Jan 6, 2026

Purpose of PR?:

Fixes #2967

That change enables that I can run all tests locally on my macOS.
576 passed, 16 skipped, 92 warnings in 101.63s (0:01:41)

The hint to use socket for reserving the port stem from AI-Assistant supporting GPT-5.1-codex-max.
Now that the test code runs fully on my MacBook, I’ve been able to do some more testing to look for problems we see on GitHub. We should postpone investigating other issues like transitioning to spawn or refining timeout settings.

@ReimarBauer ReimarBauer marked this pull request as draft January 6, 2026 10:07
@ReimarBauer ReimarBauer requested review from joernu76 and matrss January 6, 2026 12:52
@ReimarBauer ReimarBauer marked this pull request as ready for review January 6, 2026 13:06
Copy link
Collaborator

@matrss matrss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the macOS tests still aren't fully reliable with these changes and they definitely introduce a race condition I am sceptical.

Comment on lines 190 to 196
def _reserve_port(host):
"""Only reserve the port for the server"""
s = socket.socket()
s.bind((host, 0))
port = s.getsockname()[1]
s.close()
return port
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did your LLM mention that this port selection mechanism contains a race condition? ;)

Once the socket is .close()'d the OS is free to assign the port to any other process, so it could theoretically happen that a different process starts listening on the port in between the call to _reserve_port and the eventlet.listen later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I know this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has been removed. I discovered that the port can be exchanged using a port queue, allowing us to retain eventlet.listen and use the port directly within the WSGI server. This also resolves the errors I was encountering.

@matrss
Copy link
Collaborator

matrss commented Jan 6, 2026

Have you tried running the test suite in a loop on your mac? How high is the failure rate there? Or does it really run without errors for hours? Something like a while true; do ... over night was my benchmark when I worked on making it more reliable on Linux without all the builtin retries...

@ReimarBauer
Copy link
Member Author

ReimarBauer commented Jan 6, 2026

Have you tried running the test suite in a loop on your mac? How high is the failure rate there? Or does it really run without errors for hours? Something like a while true; do ... over night was my benchmark when I worked on making it more reliable on Linux without all the builtin retries...

will do, this makes the mac a heating system...

@ReimarBauer
Copy link
Member Author

ReimarBauer commented Jan 6, 2026

Given that the macOS tests still aren't fully reliable with these changes and they definitely introduce a race condition I am sceptical.

I think we have to look on that tests afterwards. There are already some PR open where we don't wait until qt is finished.

We likly have also some timeout problems for wms related tests.

@ReimarBauer
Copy link
Member Author

We’re still sometimes experiencing hanging tests that pytest can’t kill, even with this change. But it does enable me to run all tests on my Mac. I also observe these hangs locally; aborting them reveals which tests are affected. This PR doesn't resolve all issues, and I haven’t encountered the errors reported in the issue.

Tests running for over 2 minutes were mostly not automatically finished by pytest; I had to manually kill them with Ctrl+C.

=========== 576 passed, 16 skipped, 92 warnings in 105.15s (0:01:45) ===========
=========== 576 passed, 16 skipped, 92 warnings in 100.00s (0:01:39) ===========
=========== 576 passed, 16 skipped, 92 warnings in 102.45s (0:01:42) ===========
====== 1 failed, 575 passed, 16 skipped, 92 warnings in 198.85s (0:03:18) ======
====== 2 failed, 574 passed, 16 skipped, 92 warnings in 169.84s (0:02:49) ======
====== 1 failed, 575 passed, 16 skipped, 92 warnings in 973.18s (0:16:13) ======
=========== 576 passed, 16 skipped, 92 warnings in 102.40s (0:01:42) ===========
=========== 576 passed, 16 skipped, 92 warnings in 102.92s (0:01:42) ===========
=========== 576 passed, 16 skipped, 92 warnings in 106.08s (0:01:46) ===========
====== 2 failed, 574 passed, 16 skipped, 92 warnings in 389.04s (0:06:29) ======
=========== 576 passed, 16 skipped, 92 warnings in 102.96s (0:01:42) ===========
=========== 576 passed, 16 skipped, 92 warnings in 102.52s (0:01:42) ===========
=========== 576 passed, 16 skipped, 92 warnings in 107.22s (0:01:47) ===========
=========== 576 passed, 16 skipped, 92 warnings in 105.72s (0:01:45) ===========
====== 1 failed, 575 passed, 16 skipped, 92 warnings in 102.76s (0:01:42) ======
=========== 576 passed, 16 skipped, 92 warnings in 101.19s (0:01:41) ===========
=========== 576 passed, 16 skipped, 92 warnings in 103.57s (0:01:43) ===========
=========== 576 passed, 16 skipped, 92 warnings in 105.69s (0:01:45) ===========
=========== 576 passed, 16 skipped, 92 warnings in 100.17s (0:01:40) ===========
=========== 576 passed, 16 skipped, 92 warnings in 107.49s (0:01:47) ===========
=========== 576 passed, 16 skipped, 92 warnings in 133.30s (0:02:13) ===========
====== 2 failed, 574 passed, 16 skipped, 92 warnings in 414.41s (0:06:54) ======

If there is no [gw*] prefix, it indicates a test that has hung; all others have failed normally. This information wasn’t easily accessible before.


[gw11] [ 75%] FAILED tests/_test_msui/test_topview.py::Test_TopViewWMS::test_server_getmap
FAILED tests/_test_msui/test_topview.py::Test_TopViewWMS::test_server_getmap
[gw2] [ 82%] FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_server_service_cache
[gw2] [ 90%] FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_server_getmap
FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_server_service_cache
FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_server_getmap
[gw15] [ 82%] FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_multilayer_syncing
FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_multilayer_syncing
[gw13] [ 44%] FAILED tests/_test_msui/test_sideview.py::Test_SideViewWMS::test_server_getmap
[gw7] [ 88%] FAILED tests/_test_msui/test_wms_control.py::Test_VSecWMSControlWidget::test_server_getmap
FAILED tests/_test_msui/test_sideview.py::Test_SideViewWMS::test_server_getmap
FAILED tests/_test_msui/test_wms_control.py::Test_VSecWMSControlWidget::test_server_getmap
[gw7] [ 38%] FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_no_server
FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_no_server
[gw14] [ 93%] FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_no_server
[gw15] [ 95%] FAILED tests/_test_msui/test_wms_control.py::Test_VSecWMSControlWidget::test_server_getmap
FAILED tests/_test_msui/test_wms_control.py::Test_HSecWMSControlWidget::test_no_server
FAILED tests/_test_msui/test_wms_control.py::Test_VSecWMSControlWidget::test_server_getmap

For my Mac-based development, this is a big step forward, and I don't believe it will regress the Ubuntu test results. Based on that, I think we should merge.

@ReimarBauer ReimarBauer changed the title separated port reservation port reservation in a multiprocessing queue Jan 7, 2026
@ReimarBauer ReimarBauer changed the title port reservation in a multiprocessing queue port communication over a multiprocessing queue Jan 7, 2026
@ReimarBauer ReimarBauer changed the title port communication over a multiprocessing queue port communication over a multiprocessing queue in fixtures Jan 7, 2026
@ReimarBauer
Copy link
Member Author

Given that the macOS tests still aren't fully reliable with these changes and they definitely introduce a race condition I am sceptical.

removed the race condition

Copy link
Collaborator

@matrss matrss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like CI passes more or less 50% of the time now, which is a major improvement for macOS runners. I still don't understand why, but whatever, the changes look fine. Thank you!

@ReimarBauer ReimarBauer merged commit efc5697 into Open-MSS:stable Jan 7, 2026
35 of 38 checks passed
@ReimarBauer ReimarBauer deleted the i2967 branch January 7, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants