Fix multiplayer standalone crash: client stats race + host accept code#7319
Fix multiplayer standalone crash: client stats race + host accept code#7319chief1983 wants to merge 3 commits intoscp-fs2open:masterfrom
Conversation
Fixes a crash in multiplayer PXO/tracker games hosted on a standalone server where clients crash at debrief close after scoring a kill: pilotfile.cpp:295: Ship kills of '<ship class>' not found, should have been added by pilotfile::update_stats Two independent root causes were found, both of which must be fixed for correct behavior: --- Root cause 1: race condition in multi_sexp_end_mission() (sexp.cpp) --- The previous code called send_debrief_event() before multi_handle_end_mission_request() as a "hack" to skip the warp-out sequence when an end-mission SEXP fires. On standalone servers this is a race condition: scoring_eval_kill only runs on the master, and the master hasn't broadcast m_okKills to clients yet at the point multi_sexp_end_mission() is called. Entering debrief this early causes scoring_level_close() to run with zeroed m_okKills, so Pilot.update_stats() creates no kill entries for that session. When debrief_close() later calls scoring_backout_accept() it cannot find those entries and hits the UNREACHABLE. Fix: Remove the send_debrief_event() call entirely. The normal flow via multi_handle_end_mission_request() has the server broadcast stats first and then send MISSION_END, which causes the client to enter debrief with correct kill data already populated. The master enters debrief through sexp_end_mission() -> send_debrief_event() as before. The behavioral side-effect of removing this hack: clients that go through the normal warp-out sequence will no longer skip it on end-mission. Clients in a state where they cannot warp still receive immediate debrief via multi_handle_sudden_mission_end(). --- Root cause 2: host accept code never set on PXO standalone (multiui.cpp) --- In multi_debrief_accept_hit() and multi_debrief_esc_hit(), when the game is a PXO/tracker game and the local player is the game host (NETINFO_FLAG_GAME_HOST), the code calls send_store_stats_packet() to tell clients to record their stats. It then sets Multi_debrief_stats_accept_code = 1 via the per-client response path inside the NETINFO_FLAG_AM_MASTER block. On a standalone server the game host is NOT the master (NETINFO_FLAG_AM_MASTER is false for the client-side host). The NETINFO_FLAG_AM_MASTER block is skipped entirely, and send_store_stats_packet() uses multi_io_send_to_all_reliable() which skips the local player — so the host never receives its own packet. Multi_debrief_stats_accept_code stays at -1 (unset). When debrief_close() checks this value and sees it is not 1, it assumes stats were tossed or the window was closed without accepting, and runs scoring_backout_accept() unconditionally. With fix 1 in place this no longer crashes (entries exist), but silently strips all session kill credit from the host's pilot record. Fix: After the send_store_stats_packet() block, call multi_debrief_stats_accept() unconditionally for the game host so the accept code is set locally regardless of whether the host is also the standalone master. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fix removed send_debrief_event() from multi_sexp_end_mission() entirely, which was correct for the stats race condition but had a side effect: clients would no longer skip the warp-out sequence when an end-mission SEXP fires. The warp-out skip was an intentional feature — process_endgame_packet() checks whether the client is already in debrief state and, if so, bypasses multi_warpout_all_players(). Restore the original behavior by deferring the send_debrief_event() call to process_endgame_packet() rather than eliminating it. multi_sexp_end_mission() now sets Multi_sexp_end_mission_pending=true before calling multi_handle_end_mission_request(). When the client receives the MISSION_END packet, process_endgame_packet() checks the flag, calls send_debrief_event() there, then falls through to the debrief-state check which skips the warp-out exactly as before. This is safe because send_endgame_packet() calls multi_broadcast_stats() before sending MISSION_END. Both use reliable delivery, so stats are guaranteed to be received and applied before MISSION_END is processed, meaning scoring_level_close() now sees correct kill data when send_debrief_event() fires. Alt-J-initiated mission ends are unaffected: the flag is never set, so the warp-out proceeds normally for those paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Root cause 1 definitely breaks the end-mission fix, which it even confirms as a side-effect of the change. I did test that original fix with a standalone, but as you point out, it only happens after you've scored a kill, which was not a part of any of my test missions. Based on those findings I'll run some new tests and see if I can figure out a solution that fixes the problem without breaking anything. For root cause 2 it's interpretation of the code is incorrect and all is working as expected. Making that change actually breaks proper stats handling on tracker games. The master should have full control of stats, and the host and clients must never be allowed to save stats on their own. |
|
Ok, I guess the second commit I just pushed up isn't really going to be the right fix either, but it does attempt to solve the problem it introduced of the warpout being re-enabled for clients when it shouldn't be. |
The previous change added multi_debrief_stats_accept() calls for the game host in multi_debrief_accept_hit() and multi_debrief_esc_hit() to work around Multi_debrief_stats_accept_code staying -1 on a standalone server. This was based on incorrect analysis. In PXO tracker games the standalone master is intentionally the sole authority for stat storage (via multi_fs_std_tracker_store_stats() in multi_standalone_postgame_close()). The host and clients must not save stats to their local pilot files independently. Multi_debrief_stats_ accept_code remaining -1 and triggering scoring_backout_accept() on the host is correct behavior: kills are backed out of the local pilot file because the tracker holds the authoritative record, not the local file. The race condition fix (deferred send_debrief_event()) is sufficient to prevent the crash: Pilot.update_stats() now creates kill entries with correct m_okKills data, so scoring_backout_accept() can find and remove them cleanly rather than hitting the UNREACHABLE. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This was its thoughts leading up to the change for Root Cause 1 to preserve the intent of the code:
|
|
Just reverted the multiui.cpp changes, so now the only changes are the second attempt at fixing the crash while maintaining the jump out bypass. Still not sure if that's of much use but I'll go ahead and give it a spin just to see if it actually works like it says on the can. |
I tried something similar originally for the end-mission fix but it didn't work properly. One side or another still did the full warp-out sequence. I don't remember the exact change I used though, but it was definitely to the same spot. However if this works then great, but it will definitely need to be tested to confirm all of end-mission tests cases still pass. |
|
Well, I can confirm that I did not get the crash I was getting. Beyond that, not sure what else to test but I imagine there's quite a bit. |
Fixes a crash in multiplayer PXO/tracker games hosted on a standalone server where clients crash at debrief close after scoring a kill:
pilotfile.cpp:295: Ship kills of '' not found,
should have been added by pilotfile::update_stats
Two independent root causes were found, both of which must be fixed for correct behavior:
--- Root cause 1: race condition in multi_sexp_end_mission() (sexp.cpp) ---
The previous code called send_debrief_event() before multi_handle_end_mission_request() as a "hack" to skip the warp-out sequence when an end-mission SEXP fires. On standalone servers this is a race condition: scoring_eval_kill only runs on the master, and the master hasn't broadcast m_okKills to clients yet at the point multi_sexp_end_mission() is called. Entering debrief this early causes scoring_level_close() to run with zeroed m_okKills, so Pilot.update_stats() creates no kill entries for that session. When debrief_close() later calls scoring_backout_accept() it cannot find those entries and hits the UNREACHABLE.
Fix: Remove the send_debrief_event() call entirely. The normal flow via multi_handle_end_mission_request() has the server broadcast stats first and then send MISSION_END, which causes the client to enter debrief with correct kill data already populated. The master enters debrief through sexp_end_mission() -> send_debrief_event() as before.
The behavioral side-effect of removing this hack: clients that go through the normal warp-out sequence will no longer skip it on end-mission. Clients in a state where they cannot warp still receive immediate debrief via multi_handle_sudden_mission_end().
--- Root cause 2: host accept code never set on PXO standalone (multiui.cpp) ---
In multi_debrief_accept_hit() and multi_debrief_esc_hit(), when the game is a PXO/tracker game and the local player is the game host (NETINFO_FLAG_GAME_HOST), the code calls send_store_stats_packet() to tell clients to record their stats. It then sets
Multi_debrief_stats_accept_code = 1 via the per-client response path inside the NETINFO_FLAG_AM_MASTER block.
On a standalone server the game host is NOT the master (NETINFO_FLAG_AM_MASTER is false for the client-side host). The NETINFO_FLAG_AM_MASTER block is skipped entirely, and send_store_stats_packet() uses multi_io_send_to_all_reliable() which skips the local player — so the host never receives its own packet. Multi_debrief_stats_accept_code stays at -1 (unset).
When debrief_close() checks this value and sees it is not 1, it assumes stats were tossed or the window was closed without accepting, and runs scoring_backout_accept() unconditionally. With fix 1 in place this no longer crashes (entries exist), but silently strips all session kill credit from the host's pilot record.
Fix: After the send_store_stats_packet() block, call multi_debrief_stats_accept() unconditionally for the game host so the accept code is set locally regardless of whether the host is also the standalone master.
Human notes: There were several criteria necessary to reproduce this. As mentioned, it only happened when using a standalone server, and not when self-hosting my own local multi match. A co-op mission that spawned a few enemies and ended in 20 seconds was used to reproduce after the initial case. The mission was terminated at that point with the end-mission sexp, which seems critical to reproducing it, as opposed to ending by player-initiated jump-out, exiting manually, etc. I also had to actually score kills during the mission, or I assume there were no stats for it to attempt to write. The last bit was, after using a pilot to self-host, it asked if I wanted to Accept stats, which I did. That particular pilot could no longer reproduce the crash when used to host a game via a standalone server. But a new pilot went right back to crashing every time I ran that mission on a standalone and registered kills. So, it seems something was created in the pilot file when I self-hosted that also prevented the crash case itself.
I don't really like the idea that this proposed fix from Claude would restore warp-out when end-mission is used, so I'm opening this merely as a draft to provide the research and explanation for what it thinks is a problem and potential fix, so a proper fix could hopefully be implemented from there. One change ties back directly to #6943 so I'm tagging @notimaginative as a requested reviewer here.