Skip to content

fix: no retry when remote table metadata not found#1386

Merged
Slach merged 4 commits into
Altinity:masterfrom
leno23:hermes-auto/clickhouse-backup-1379-skip-missing-metadata
May 30, 2026
Merged

fix: no retry when remote table metadata not found#1386
Slach merged 4 commits into
Altinity:masterfrom
leno23:hermes-auto/clickhouse-backup-1379-skip-missing-metadata

Conversation

@leno23
Copy link
Copy Markdown
Contributor

@leno23 leno23 commented May 17, 2026

Summary

  • detect permanent remote metadata-missing errors before the download metadata retry loop keeps backing off
  • warn and skip the affected table when its JSON metadata file is absent on remote storage
  • preserve the existing optional .sql handling for incremental/embedded metadata and return non-not-found errors directly
  • add a focused regression test for the not-found classifier patterns seen from common object stores

Tests

  • go test ./pkg/backup -run TestIsRemoteMetadataNotFound -count=1
  • go test ./pkg/backup -count=1

Closes #1379

Treat object-store metadata 404s as permanent so download skips the
affected table immediately instead of exhausting the retry backoff.
@Slach Slach added this to the 2.7.1 milestone May 22, 2026
Slach and others added 3 commits May 28, 2026 22:53
…ermes-auto/clickhouse-backup-1379-skip-missing-metadata
…ermes-auto/clickhouse-backup-1379-skip-missing-metadata

# Conflicts:
#	pkg/backup/download.go
#	pkg/backup/download_test.go
A missing table .json on remote storage is a permanent broken-backup
condition, not a transient one. Return a clear error instead of silently
skipping the table (a silent skip restores fewer tables than the backup
claims, with no signal to the operator).

Detection now covers all backends' not-found phrasings (S3 NoSuchKey,
GCS doesn't exist/404, Azure BlobNotFound, FTP "No such file or
directory", SFTP "file does not exist") so the 404 breaks out of the
retry loop immediately instead of burning the ~35s exponential backoff.

Add per-backend integration coverage (S3/SFTP/FTP/GCS/GCS-emulator/
AZBLOB/COS) asserting download fails fast with a not-found error and
without retrying.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Slach
Copy link
Copy Markdown
Collaborator

Slach commented May 30, 2026

Changed the missing-table.json behavior on download from silent skip to fail fast with an error.

Rationale: a missing table .json on remote is a permanent broken-backup condition. Silently skipping it restores fewer tables than the backup metadata claims, with no signal to the operator — a quiet data-loss footgun on restore. A 404 should surface, not be swallowed.

Kept the no-retry fix (the whole point of #1379): the not-found 404 still breaks out of the retry loop immediately instead of burning the ~35s exponential backoff. Detection now covers all backends' phrasings (S3 NoSuchKey, GCS doesn't exist/404, Azure BlobNotFound, FTP No such file or directory, SFTP file does not exist) — without this, FTP/SFTP/Azure were not classified and still retried.

Added per-backend integration tests (testMetadataNotFound_test.go, S3/SFTP/FTP/GCS/GCS-emulator/AZBLOB/COS) asserting download fails fast with a not-found error and no retry. All 7 green.

Note: the optional .sql (incremental/embedded) metadata is still skipped when absent, unchanged.

@Slach Slach marked this pull request as ready for review May 30, 2026 13:49
@Slach Slach changed the title fix: skip missing remote table metadata fix: no retry when remote table metadata not found May 30, 2026
@Slach Slach merged commit 0c0386d into Altinity:master May 30, 2026
55 of 56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No retry when table metadata files not exists instead of retrying forever

2 participants