Skip to content

Conversation

@torosent
Copy link
Member

@torosent torosent commented Dec 2, 2025

Summary

Fixes Azure/azure-functions-durable-extension#3264

This PR extends the fix from #1189 to also handle the entity-specific code path in DecompressLargeEntityProperties.

Problem

When retrieving history events for entities (instance IDs starting with @) with large properties stored in blob storage, a BlobNotFound error can occur under heavy load. This happens due to a race condition similar to what PR #1189 addressed for orchestrations:

  1. A late message from a previous execution arrives after ContinueAsNew deleted the blobs
  2. The blob was cleaned up due to retention policies
  3. A race condition between blob cleanup and history retrieval for entities

The stack trace shows the failure occurring in DecompressLargeEntityProperties:

at DurableTask.AzureStorage.Storage.Blob.DownloadStreamingAsync
at DurableTask.AzureStorage.MessageManager.DownloadAndDecompressAsBytesAsync
at DurableTask.AzureStorage.Tracking.AzureTableTrackingStore.DecompressLargeEntityProperties
at DurableTask.AzureStorage.Tracking.AzureTableTrackingStore.GetHistoryEventsAsync

While PR #1189 added execution ID validation at the history retrieval level, there's still a window where:

  1. The execution ID check passes (entities from current execution)
  2. But the blob is deleted between the check and the DecompressLargeEntityProperties call

Solution

  1. Changed DecompressLargeEntityProperties to return bool indicating success
  2. Added try-catch for RequestFailedException with status 404 (BlobNotFound)
  3. Also handles the wrapped DurableTaskStorageException case
  4. When blob is not found, logs a warning and returns false
  5. Caller skips the history entry instead of failing the orchestration

This allows the orchestration to continue processing even when stale history entries reference blobs that no longer exist.

Testing

  • Build passes
  • Unit tests

Related Issues

Fixes Azure/azure-functions-durable-extension#3264

When retrieving history events for entities with large properties stored
in blob storage, a BlobNotFound error can occur if the blob was deleted
due to a race condition (e.g., ContinueAsNew deleted blobs from a previous
execution while a late message is still being processed).

This change extends the fix from PR #1189 to also handle the entity-specific
code path in DecompressLargeEntityProperties:

1. Changed DecompressLargeEntityProperties to return bool indicating success
2. Added try-catch for RequestFailedException with status 404 (BlobNotFound)
3. Also handles the wrapped DurableTaskStorageException case
4. When blob is not found, logs a warning and returns false
5. Caller skips the history entry instead of failing the orchestration

This allows the orchestration to continue processing even when stale history
entries reference blobs that no longer exist.
@torosent torosent marked this pull request as draft December 2, 2025 17:19
…erties

These tests verify:
1. Method signature returns Task<bool> to indicate success/failure
2. RequestFailedException with status 404 can be caught by the when clause
3. DurableTaskStorageException wrapping 404 can also be caught
4. Other status codes (e.g., 500) are NOT caught by the 404 filter

See: Azure/azure-functions-durable-extension#3264
@torosent torosent closed this Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Durable Functions orchestration fails with BlobNotFound error during history retrieval (Azure Storage provider)

1 participant