Skip to content

fix: prefer underlying disk name over cache wrapper in getDisksFromSystemDisks#1397

Open
vizsmurali wants to merge 7 commits into
Altinity:masterfrom
vizsmurali:fix/cache-disk-name-resolution
Open

fix: prefer underlying disk name over cache wrapper in getDisksFromSystemDisks#1397
vizsmurali wants to merge 7 commits into
Altinity:masterfrom
vizsmurali:fix/cache-disk-name-resolution

Conversation

@vizsmurali
Copy link
Copy Markdown

fix: prefer underlying disk name over cache wrapper in getDisksFromSystemDisks

Problem

When a ClickHouse cache disk wraps an S3 object disk, both disks share the same
path (metadata path) in system.disks. The GROUP BY d.path clause in
getDisksFromSystemDisks() collapses them into one row, and any(d.name) may
arbitrarily pick the cache disk name (e.g., s3_cache) instead of the
underlying disk name (e.g., s3_disk).

This causes fetchHashOfAllFiles to query system.parts with
disk_name = 's3_cache', but system.parts always reports the underlying disk
name (s3_disk), resulting in:

part "..." not found in system.parts (database=..., table=..., disk_name=s3_cache) after FREEZE

The bug is non-deterministicany() may pick either name depending on
the query execution order.

Why d.type is insufficient

In ClickHouse 25.x+, both cache and underlying disks report
type = 'ObjectStorage' in system.disks. The older type = 'Cache'
classification is no longer used, so type-based filtering cannot distinguish
them.

Fix

Use the cache_path column from system.disks to reliably distinguish cache
disks from underlying disks:

  • Cache disks have a non-empty cache_path
    (e.g., /var/lib/clickhouse/filesystem_caches/s3_cache/)
  • Underlying disks have an empty cache_path

Replace any(d.name) with argMin(d.name, d.cache_path != ''):

  • For non-cache disks: cache_path != '' evaluates to 0 (false)
  • For cache disks: cache_path != '' evaluates to 1 (true)
  • argMin picks the name with the minimum sort value, i.e., the
    non-cache disk name (0 < 1)

The fix also detects whether the cache_path column exists in system.disks
(for backward compatibility with older ClickHouse versions) before using it.

Changes

  • pkg/clickhouse/clickhouse.go:
    • Added CachePathPresent field to DiskFields struct
    • Added detection of cache_path column in system.columns
    • Changed disk name SQL from any(d.name) to
      argMin(d.name, d.cache_path != '') when cache_path column exists
    • Parameterized the disk name SQL in the final query

Evidence

system.disks showing the shared path problem

┌─name─────┬─path───────────────────────────────┬─type──────────┬─cache_path──────────────────────────────────┐
│ default  │ /var/lib/clickhouse/               │ Local         │                                             │
│ s3_cache │ /var/lib/clickhouse/disks/s3_disk/ │ ObjectStorage │ /var/lib/clickhouse/filesystem_caches/s3_cache/ │
│ s3_disk  │ /var/lib/clickhouse/disks/s3_disk/ │ ObjectStorage │                                             │
└──────────┴────────────────────────────────────┴───────────────┴─────────────────────────────────────────────┘

Before fix (v2.7.0)

ERR part "all_1_42_7" not found in system.parts
    (database=mydb, table=my_table, disk_name=s3_cache) after FREEZE

After fix

Successfully backed up a database with multiple tables (TiB-scale S3 object disk

  • GiB-scale EBS local data) with zero errors on a ClickHouse 25.11 cluster.

Backward Compatibility

  • The cache_path column detection is gated behind a check in
    system.columns, so older ClickHouse versions without this column will
    continue to use any(d.name) as before.
  • When no cache disks are configured, argMin(d.name, d.cache_path != '')
    degrades to picking an arbitrary name (all have cache_path = ''), which is
    equivalent to the current any(d.name) behavior.

Testing

  • Manually tested on ClickHouse 25.11 with hybrid tiered storage
    (local disk + S3 object disk + S3 cache disk)
  • Full database backup of multiple tables with TiB-scale S3 object disk data,
    completed with zero errors
  • Full database restore: verified disk routing (hot data → local, cold data → S3)

Fixes #1396

…stemDisks

When a cache disk wraps an S3 object disk, both share the same path in
system.disks. GROUP BY d.path collapses them into one row, and any(d.name)
may arbitrarily pick the cache disk name. Since system.parts always reports
the underlying disk name, this mismatch causes fetchHashOfAllFiles to fail
with 'part not found after FREEZE'.

In ClickHouse 25.x+, both cache and underlying disks report
type='ObjectStorage', so d.type cannot distinguish them. Instead, use the
cache_path column: cache disks have non-empty cache_path, underlying disks
have empty cache_path.

Replace any(d.name) with argMin(d.name, d.cache_path != '') when the
cache_path column exists. argMin picks the name with the minimum sort value,
preferring the non-cache disk (0) over the cache wrapper (1).

The fix is gated behind a column existence check for backward compatibility
with older ClickHouse versions.
@Slach Slach self-requested a review May 26, 2026 17:25
Copy link
Copy Markdown
Collaborator

@Slach Slach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add e2e test in test/integration/ ?

Adds TestCacheDiskBackup which verifies that backup and restore work
correctly when a ClickHouse cache disk wraps an S3 object disk.

The test:
1. Configures ClickHouse with s3_disk + s3_cache (cache wrapper) + s3_tiered policy
2. Creates a table with TTL to move old data to S3 cold volume
3. Runs backup — this is where the bug manifested (part not found after FREEZE)
4. Uploads, downloads, and restores the backup
5. Verifies data integrity after restore

Covers the fix in getDisksFromSystemDisks that uses argMin(d.name, d.cache_path != '')
to prefer the underlying disk name over the cache wrapper.
@vizsmurali
Copy link
Copy Markdown
Author

vizsmurali commented May 26, 2026

Thanks for the review! I've added an e2e integration test in test/integration/cacheDiskBackup_test.go (TestCacheDiskBackup).

The test:

  1. Configures ClickHouse with s3_disk + s3_cache (cache wrapper) + s3_tiered storage policy
  2. Creates a table with TTL to move old data to the S3 cold volume
  3. Runs backup — this is where the bug manifested (part not found after FREEZE)
  4. Uploads, downloads, and restores the backup
  5. Verifies data integrity (row count) after restore

Requires ClickHouse >= 22.3 for cache disk support. Compiles and passes go vet.

@Slach please review
Thanks again

@vizsmurali vizsmurali requested a review from Slach May 26, 2026 18:56
@Slach Slach added this to the 2.7.1 milestone May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backup fails with 'part not found after FREEZE' when cache disk wraps an S3 object disk

2 participants