Skip to content

Fix multiplayer standalone crash: client stats race + host accept code#7319

Draft
chief1983 wants to merge 3 commits intoscp-fs2open:masterfrom
chief1983:fix/standalone-client-stats-crash
Draft

Fix multiplayer standalone crash: client stats race + host accept code#7319
chief1983 wants to merge 3 commits intoscp-fs2open:masterfrom
chief1983:fix/standalone-client-stats-crash

Conversation

@chief1983
Copy link
Member

Fixes a crash in multiplayer PXO/tracker games hosted on a standalone server where clients crash at debrief close after scoring a kill:

pilotfile.cpp:295: Ship kills of '' not found,
should have been added by pilotfile::update_stats

Two independent root causes were found, both of which must be fixed for correct behavior:

--- Root cause 1: race condition in multi_sexp_end_mission() (sexp.cpp) ---

The previous code called send_debrief_event() before multi_handle_end_mission_request() as a "hack" to skip the warp-out sequence when an end-mission SEXP fires. On standalone servers this is a race condition: scoring_eval_kill only runs on the master, and the master hasn't broadcast m_okKills to clients yet at the point multi_sexp_end_mission() is called. Entering debrief this early causes scoring_level_close() to run with zeroed m_okKills, so Pilot.update_stats() creates no kill entries for that session. When debrief_close() later calls scoring_backout_accept() it cannot find those entries and hits the UNREACHABLE.

Fix: Remove the send_debrief_event() call entirely. The normal flow via multi_handle_end_mission_request() has the server broadcast stats first and then send MISSION_END, which causes the client to enter debrief with correct kill data already populated. The master enters debrief through sexp_end_mission() -> send_debrief_event() as before.

The behavioral side-effect of removing this hack: clients that go through the normal warp-out sequence will no longer skip it on end-mission. Clients in a state where they cannot warp still receive immediate debrief via multi_handle_sudden_mission_end().

--- Root cause 2: host accept code never set on PXO standalone (multiui.cpp) ---

In multi_debrief_accept_hit() and multi_debrief_esc_hit(), when the game is a PXO/tracker game and the local player is the game host (NETINFO_FLAG_GAME_HOST), the code calls send_store_stats_packet() to tell clients to record their stats. It then sets
Multi_debrief_stats_accept_code = 1 via the per-client response path inside the NETINFO_FLAG_AM_MASTER block.

On a standalone server the game host is NOT the master (NETINFO_FLAG_AM_MASTER is false for the client-side host). The NETINFO_FLAG_AM_MASTER block is skipped entirely, and send_store_stats_packet() uses multi_io_send_to_all_reliable() which skips the local player — so the host never receives its own packet. Multi_debrief_stats_accept_code stays at -1 (unset).

When debrief_close() checks this value and sees it is not 1, it assumes stats were tossed or the window was closed without accepting, and runs scoring_backout_accept() unconditionally. With fix 1 in place this no longer crashes (entries exist), but silently strips all session kill credit from the host's pilot record.

Fix: After the send_store_stats_packet() block, call multi_debrief_stats_accept() unconditionally for the game host so the accept code is set locally regardless of whether the host is also the standalone master.

Human notes: There were several criteria necessary to reproduce this. As mentioned, it only happened when using a standalone server, and not when self-hosting my own local multi match. A co-op mission that spawned a few enemies and ended in 20 seconds was used to reproduce after the initial case. The mission was terminated at that point with the end-mission sexp, which seems critical to reproducing it, as opposed to ending by player-initiated jump-out, exiting manually, etc. I also had to actually score kills during the mission, or I assume there were no stats for it to attempt to write. The last bit was, after using a pilot to self-host, it asked if I wanted to Accept stats, which I did. That particular pilot could no longer reproduce the crash when used to host a game via a standalone server. But a new pilot went right back to crashing every time I ran that mission on a standalone and registered kills. So, it seems something was created in the pilot file when I self-hosted that also prevented the crash case itself.

I don't really like the idea that this proposed fix from Claude would restore warp-out when end-mission is used, so I'm opening this merely as a draft to provide the research and explanation for what it thinks is a problem and potential fix, so a proper fix could hopefully be implemented from there. One change ties back directly to #6943 so I'm tagging @notimaginative as a requested reviewer here.

Fixes a crash in multiplayer PXO/tracker games hosted on a standalone
server where clients crash at debrief close after scoring a kill:

  pilotfile.cpp:295: Ship kills of '<ship class>' not found,
  should have been added by pilotfile::update_stats

Two independent root causes were found, both of which must be fixed for
correct behavior:

--- Root cause 1: race condition in multi_sexp_end_mission() (sexp.cpp) ---

The previous code called send_debrief_event() before
multi_handle_end_mission_request() as a "hack" to skip the warp-out
sequence when an end-mission SEXP fires. On standalone servers this is
a race condition: scoring_eval_kill only runs on the master, and the
master hasn't broadcast m_okKills to clients yet at the point
multi_sexp_end_mission() is called. Entering debrief this early causes
scoring_level_close() to run with zeroed m_okKills, so
Pilot.update_stats() creates no kill entries for that session.
When debrief_close() later calls scoring_backout_accept() it cannot
find those entries and hits the UNREACHABLE.

Fix: Remove the send_debrief_event() call entirely. The normal flow
via multi_handle_end_mission_request() has the server broadcast stats
first and then send MISSION_END, which causes the client to enter
debrief with correct kill data already populated. The master enters
debrief through sexp_end_mission() -> send_debrief_event() as before.

The behavioral side-effect of removing this hack: clients that go
through the normal warp-out sequence will no longer skip it on
end-mission. Clients in a state where they cannot warp still receive
immediate debrief via multi_handle_sudden_mission_end().

--- Root cause 2: host accept code never set on PXO standalone (multiui.cpp) ---

In multi_debrief_accept_hit() and multi_debrief_esc_hit(), when the
game is a PXO/tracker game and the local player is the game host
(NETINFO_FLAG_GAME_HOST), the code calls send_store_stats_packet() to
tell clients to record their stats. It then sets
Multi_debrief_stats_accept_code = 1 via the per-client response path
inside the NETINFO_FLAG_AM_MASTER block.

On a standalone server the game host is NOT the master
(NETINFO_FLAG_AM_MASTER is false for the client-side host). The
NETINFO_FLAG_AM_MASTER block is skipped entirely, and
send_store_stats_packet() uses multi_io_send_to_all_reliable() which
skips the local player — so the host never receives its own packet.
Multi_debrief_stats_accept_code stays at -1 (unset).

When debrief_close() checks this value and sees it is not 1, it
assumes stats were tossed or the window was closed without accepting,
and runs scoring_backout_accept() unconditionally. With fix 1 in place
this no longer crashes (entries exist), but silently strips all session
kill credit from the host's pilot record.

Fix: After the send_store_stats_packet() block, call
multi_debrief_stats_accept() unconditionally for the game host so the
accept code is set locally regardless of whether the host is also the
standalone master.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fix removed send_debrief_event() from
multi_sexp_end_mission() entirely, which was correct for the stats
race condition but had a side effect: clients would no longer skip the
warp-out sequence when an end-mission SEXP fires.  The warp-out skip
was an intentional feature — process_endgame_packet() checks whether
the client is already in debrief state and, if so, bypasses
multi_warpout_all_players().

Restore the original behavior by deferring the send_debrief_event()
call to process_endgame_packet() rather than eliminating it.
multi_sexp_end_mission() now sets Multi_sexp_end_mission_pending=true
before calling multi_handle_end_mission_request().  When the client
receives the MISSION_END packet, process_endgame_packet() checks the
flag, calls send_debrief_event() there, then falls through to the
debrief-state check which skips the warp-out exactly as before.

This is safe because send_endgame_packet() calls multi_broadcast_stats()
before sending MISSION_END.  Both use reliable delivery, so stats are
guaranteed to be received and applied before MISSION_END is processed,
meaning scoring_level_close() now sees correct kill data when
send_debrief_event() fires.

Alt-J-initiated mission ends are unaffected: the flag is never set,
so the warp-out proceeds normally for those paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@notimaginative
Copy link
Contributor

Root cause 1 definitely breaks the end-mission fix, which it even confirms as a side-effect of the change. I did test that original fix with a standalone, but as you point out, it only happens after you've scored a kill, which was not a part of any of my test missions. Based on those findings I'll run some new tests and see if I can figure out a solution that fixes the problem without breaking anything.

For root cause 2 it's interpretation of the code is incorrect and all is working as expected. Making that change actually breaks proper stats handling on tracker games. The master should have full control of stats, and the host and clients must never be allowed to save stats on their own.

@chief1983
Copy link
Member Author

Ok, I guess the second commit I just pushed up isn't really going to be the right fix either, but it does attempt to solve the problem it introduced of the warpout being re-enabled for clients when it shouldn't be.

The previous change added multi_debrief_stats_accept() calls for the
game host in multi_debrief_accept_hit() and multi_debrief_esc_hit() to
work around Multi_debrief_stats_accept_code staying -1 on a standalone
server.  This was based on incorrect analysis.

In PXO tracker games the standalone master is intentionally the sole
authority for stat storage (via multi_fs_std_tracker_store_stats() in
multi_standalone_postgame_close()).  The host and clients must not save
stats to their local pilot files independently.  Multi_debrief_stats_
accept_code remaining -1 and triggering scoring_backout_accept() on the
host is correct behavior: kills are backed out of the local pilot file
because the tracker holds the authoritative record, not the local file.

The race condition fix (deferred send_debrief_event()) is sufficient to
prevent the crash: Pilot.update_stats() now creates kill entries with
correct m_okKills data, so scoring_backout_accept() can find and remove
them cleanly rather than hitting the UNREACHABLE.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@chief1983
Copy link
Member Author

This was its thoughts leading up to the change for Root Cause 1 to preserve the intent of the code:

The warp-out is skipped when the client is already in debrief state when MISSION_END arrives. The hack worked by calling send_debrief_event() first to get into debrief, so by the time MISSION_END was processed, the check skipped the warp-out. The problem was doing that before stats arrived.

The fix is: set the debrief state inside process_endgame_packet() on the client side, right before that check — at which point multi_broadcast_stats() has already fired (it's called in send_endgame_packet() before the MISSION_END packet is sent, and reliable packets are ordered). But we only want to do it when the mission ended via end-mission SEXP, not on a player-initiated Alt-J.

The cleanest approach is a flag:

Both files include multiutil.h. The plan is clear now:

In multiutil.h: declare extern bool Multi_sexp_end_mission_pending;
In multiutil.cpp: define it, initialized to false
In multi_sexp_end_mission() (sexp.cpp): set it true instead of calling send_debrief_event()
In process_endgame_packet() (multimsgs.cpp) client path: if the flag is set, clear it and call send_debrief_event() — at which point stats have already been received
This preserves the warp-out skip for end-mission SEXP exactly as intended, while eliminating the race condition because send_debrief_event() (and therefore scoring_level_close()) now runs after multi_broadcast_stats() has delivered m_okKills.

Let me look at the current state of our sexp.cpp fix before making changes:

Good. Now I have everything I need. Here's the plan for the revised fix:

multiutil.h — add extern bool Multi_sexp_end_mission_pending; near send_debrief_event()

multiutil.cpp — define bool Multi_sexp_end_mission_pending = false; near Multi_ship_status_stamp

sexp.cpp — set the flag instead of calling send_debrief_event()

multimsgs.cpp — in process_endgame_packet() client path, check flag → call send_debrief_event() → clear flag

Done. The revised approach fully preserves the warp-out skip feature:

multi_sexp_end_mission() sets Multi_sexp_end_mission_pending = true and then calls multi_handle_end_mission_request() as before
The server's send_endgame_packet() broadcasts stats to clients first, then sends MISSION_END (reliable/ordered)
process_endgame_packet() on the client checks the flag, calls send_debrief_event() there — stats already populated — then the existing debrief-state check immediately skips multi_warpout_all_players() exactly as the original hack intended
Alt-J paths are unaffected (flag is never set, warp-out proceeds normally)

@chief1983
Copy link
Member Author

Just reverted the multiui.cpp changes, so now the only changes are the second attempt at fixing the crash while maintaining the jump out bypass. Still not sure if that's of much use but I'll go ahead and give it a spin just to see if it actually works like it says on the can.

@notimaginative
Copy link
Contributor

Ok, I guess the second commit I just pushed up isn't really going to be the right fix either, but it does attempt to solve the problem it introduced of the warpout being re-enabled for clients when it shouldn't be.

I tried something similar originally for the end-mission fix but it didn't work properly. One side or another still did the full warp-out sequence. I don't remember the exact change I used though, but it was definitely to the same spot.

However if this works then great, but it will definitely need to be tested to confirm all of end-mission tests cases still pass.

@chief1983
Copy link
Member Author

Well, I can confirm that I did not get the crash I was getting. Beyond that, not sure what else to test but I imagine there's quite a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants