Skip to content

Instances stuck due to DurableTask.AzureStorage.Storage.DurableTaskStorageException: An error occurred while communicating with Azure Storage ---> Azure.RequestFailedException: The specified blob does not exist. #1215

@firedigger

Description

@firedigger

Seems similar to #802 so I expected it to be fixed by #1189 but when I deployed functions yesterday it still failed with those errors:

An unexpected failure occurred while processing instance '142b5c68ff814398b73df7881a1893d0': DurableTask.AzureStorage.Storage.DurableTaskStorageException: An error occurred while communicating with Azure Storage
 ---> Azure.RequestFailedException: The specified blob does not exist.
RequestId:ca18c208-f01e-0017-1b4c-c4c4c3000000
Time:2025-05-13T21:19:56.3181246Z
Status: 404 (The specified blob does not exist.)
ErrorCode: BlobNotFound

Content:
<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist.
RequestId:ca18c208-f01e-0017-1b4c-c4c4c3000000
Time:2025-05-13T21:19:56.3181246Z</Message></Error>

Headers:
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: ca18c208-f01e-0017-1b4c-c4c4c3000000
x-ms-client-request-id: 7b601f94-099e-44cc-bc26-b238991eee82
x-ms-version: 2024-05-04
x-ms-error-code: BlobNotFound
Date: Tue, 13 May 2025 21:19:56 GMT
Content-Length: 215
Content-Type: application/xml

   at Azure.Storage.Blobs.BlobRestClient.DownloadAsync(String snapshot, String versionId, Nullable`1 timeout, String range, String leaseId, Nullable`1 rangeGetContentMD5, Nullable`1 rangeGetContentCRC64, String encryptionKey, String encryptionKeySha256, Nullable`1 encryptionAlgorithm, Nullable`1 ifModifiedSince, Nullable`1 ifUnmodifiedSince, String ifMatch, String ifNoneMatch, String ifTags, CancellationToken cancellationToken)
   at Azure.Storage.Blobs.Specialized.BlobBaseClient.StartDownloadAsync(HttpRange range, BlobRequestConditions conditions, DownloadTransferValidationOptions validationOptions, Int64 startOffset, Boolean async, CancellationToken cancellationToken)
   at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadStreamingInternal(HttpRange range, BlobRequestConditions conditions, DownloadTransferValidationOptions transferValidationOverride, IProgress`1 progressHandler, String operationName, Boolean async, CancellationToken cancellationToken)
   at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadStreamingDirect(HttpRange range, BlobRequestConditions conditions, DownloadTransferValidationOptions transferValidationOverride, IProgress`1 progressHandler, String operationName, Boolean async, CancellationToken cancellationToken)
   at Azure.Storage.Blobs.Specialized.BlobBaseClient.DownloadStreamingAsync(BlobDownloadOptions options, CancellationToken cancellationToken)
   at DurableTask.AzureStorage.Storage.ClientResponseExtensions.DecorateFailure[T](Task`1 responseTask)
   --- End of inner exception stack trace ---
   at DurableTask.AzureStorage.Storage.ClientResponseExtensions.DecorateFailure[T](Task`1 responseTask) in /_/src/DurableTask.AzureStorage/Storage/ClientResponseExtensions.cs:line 46
   at DurableTask.AzureStorage.Storage.Blob.DownloadStreamingAsync(CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/Storage/Blob.cs:line 119
   at DurableTask.AzureStorage.MessageManager.DownloadAndDecompressAsBytesAsync(Blob blob, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/MessageManager.cs:line 237
   at DurableTask.AzureStorage.MessageManager.DownloadAndDecompressAsBytesAsync(String blobName, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/MessageManager.cs:line 220
   at DurableTask.AzureStorage.Tracking.AzureTableTrackingStore.DecompressLargeEntityProperties(TableEntity entity, List`1 listOfBlobs, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs:line 1113
   at DurableTask.AzureStorage.Tracking.AzureTableTrackingStore.GetHistoryEventsAsync(String instanceId, String expectedExecutionId, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs:line 187
   at DurableTask.AzureStorage.OrchestrationSessionManager.ScheduleOrchestrationStatePrefetch(LinkedListNode`1 node, Guid traceActivityId, CancellationToken cancellationToken) in /_/src/DurableTask.AzureStorage/OrchestrationSessionManager.cs:line 513

The difference is that in the linked issue it was more like a log-spamming warning, but in my case it affects the job progress, and I identified specific code that triggers the issue - cancellationToken in CreateTimer call:

public async Task ScheduleUsers(TaskOrchestrationContext context, BackupPayload input, int organizationId, bool withTimeout)
{
    var users = await context.CallActivityAsync<IList<string>>("GetUsers", organizationId);
    var batchSize = configuration.GetValue("maximumConcurrentUserBackupTasks", 500);
    foreach (var batch in users.Batch(batchSize))
    {
        var tasks = new List<Task>();
        foreach (var user in batch)
        {
            tasks.Add(context.CallActivityAsync("BackupActivity", new BackupPayload
            {
                InstanceId = context.InstanceId,
                UserId = user,
                IsManual = input.IsManual,
                OrganizationId = input.OrganizationId
            }));
        }
        if (withTimeout)
        {
            using var cts = new CancellationTokenSource();
            var timeout = context.CreateTimer(context.CurrentUtcDateTime.AddHours(1), cts.Token);
            if (await Task.WhenAny(Task.WhenAll(tasks), timeout) != timeout)
            {
                cts.Cancel();
            }
        }
        else
        {
            await Task.WhenAll(tasks);
        }
    }
}

So with withTimeout=false this will run without issues if I just wait for tasks, but the timeout path will cause the orchestration to have issues.
I would like to add that it does not just fail, I use the eternal orchestration (I believe just like in related issues), and the waking up from timer there I think fails. So when I schedule the processes, their first iteration goes as planned, but in 12 hours (when the next iteration is supposed to kick in after a timer), it fails. The eternal orchestration is powered like:

await context.CreateTimer(input.FireAt, CancellationToken.None);
*DO WORK*
input.FireAt = input.FireAt.AddHours(12);
context.ContinueAsNew(input);

I am running latest versions of all Azure.Worker extensions and I use .NET 9 isolated azure functions with app service plan.
Is there a problem with my code I am missing? I can live without the timeout but would want it to make my code more robust so that prolonged user processes do not halt the progress for the later users.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions