Skip to content

Fix help thread refresh bug#1345

Open
digit01Wave wants to merge 1 commit into
PuzzleServer:mainfrom
digit01Wave:jessica/fixRaceConditionForHelpThreads
Open

Fix help thread refresh bug#1345
digit01Wave wants to merge 1 commit into
PuzzleServer:mainfrom
digit01Wave:jessica/fixRaceConditionForHelpThreads

Conversation

@digit01Wave
Copy link
Copy Markdown
Contributor

During the bugbash on June 1, a subset of players suddenly kept seeing a blank help thread page because the server was returning an error.

Client side error:
[2026-06-02T03:21:58.966Z] Error: Connection disconnected with error 'Error: Server returned an error on close: Connection closed with an error.'.
image

Error from azure:
A second operation was started on this context instance before a previous operation completed. This is usually caused by different threads concurrently using the same instance of DbContext. For more information on how to avoid threading issues with DbContext, see https://go.microsoft.com/fwlink/?linkid=2097913.
unhelpful, but:
System.InvalidOperationException:
at Microsoft.EntityFrameworkCore.Infrastructure.Internal.ConcurrencyDetector.EnterCriticalSection (Microsoft.EntityFrameworkCore, Version=10.0.5.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1+AsyncEnumerator+d__20.MoveNext (Microsoft.EntityFrameworkCore.Relational, Version=10.0.5.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)

Here is the explanation and fix from AI

The deployed failure is consistent with Blazor Server dbContext lifetime, not the SignalR payload shape anymore.
• ThreadMessageDTO is already the full payload.
• PuzzleThreadComponent already updates from the incoming DTO and no longer fetches the message from the database on receive.
• PuzzleThreadService was still holding an injected scoped PuzzleServerContext.
• In Blazor Server, scoped services live for the circuit, so overlapping callbacks can share the same dbContext.
• Under load, one circuit can start another component/service operation before the prior EF query finishes, causing the ConcurrencyDetector.EnterCriticalSection exception.
• An unhandled server-side exception can close the Blazor circuit, which matches the browser console disconnect.

What was changed:
• PuzzleThreadService now injects IDbContextFactory instead of a scoped PuzzleServerContext.
• Each public service operation creates its own short-lived context with CreateDbContextAsync.
• Helper methods now receive that context explicitly.
• The transaction creation is now async via BeginTransactionAsync.
• PuzzleThreadComponent now has a simple isMutationRunning guard to ignore overlapping send/edit/delete/claim clicks from the same component instance.
This is better than a lock as the long-term Blazor Server fix because:
• It follows the recommended Blazor EF pattern.
• It avoids serializing unrelated operations through a shared context.
• It keeps each operation as its own EF unit of work.
• It removes the underlying shared-dbContext concurrency hazard instead of masking it.
Validation:
• Build completed successfully.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a Blazor Server circuit stability issue that could disconnect clients and show a blank help thread page when overlapping EF Core operations reused the same scoped DbContext under load.

Changes:

  • Refactors PuzzleThreadService to use IDbContextFactory<PuzzleServerContext> and create a fresh DbContext per operation (including async transaction creation).
  • Updates helper methods to take an explicit PuzzleServerContext parameter (removing reliance on a shared injected context).
  • Adds a simple UI-level guard (isMutationRunning) and disables thread mutation controls while a mutation is in progress.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
ServerCore/Pages/Threads/PuzzleThreadService.cs Moves from injected scoped DbContext to IDbContextFactory + per-call contexts to avoid concurrent-use exceptions.
ServerCore/Pages/Threads/PuzzleThreadComponent.razor.cs Adds isMutationRunning guard around mutation operations to reduce overlapping user actions.
ServerCore/Pages/Threads/PuzzleThreadComponent.razor Disables inputs/actions while isMutationRunning is true to prevent repeated clicks during mutations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 261 to 265
try
{
isMutationRunning = true;
errorMessage = null;
await mutation();

async Task RunMutationAsync(Func<Task> mutation)
{
if (isMutationRunning)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isMutationRunning code lacks a critical section or equivalent, it only works 100% of the time if it can be guaranteed to run in a single threaded environment. Do we have any assurance of that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just wait for the previous mutation to complete and then perform the action?

<p>The more details you add here, the more helpful the responses you'll get will be!</p>
}
<textarea @bind="NewMessageText" class="form-control message-subtitle" rows="4" style="width:75%" placeholder="Type message here" autocomplete="off"></textarea>
<textarea @bind="NewMessageText" class="form-control message-subtitle" rows="4" style="width:75%" placeholder="Type message here" autocomplete="off" disabled="@isMutationRunning"></textarea>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the C# code is running on the server since it's part of ServerCore, so updating based on these bools would require a message back to the client. Is a mutation such a long-running operation that it's worth sending a message back to the client saying to wait?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants