fix: prefer underlying disk name over cache wrapper in getDisksFromSystemDisks#1397
Open
vizsmurali wants to merge 7 commits into
Open
fix: prefer underlying disk name over cache wrapper in getDisksFromSystemDisks#1397vizsmurali wants to merge 7 commits into
vizsmurali wants to merge 7 commits into
Conversation
…stemDisks When a cache disk wraps an S3 object disk, both share the same path in system.disks. GROUP BY d.path collapses them into one row, and any(d.name) may arbitrarily pick the cache disk name. Since system.parts always reports the underlying disk name, this mismatch causes fetchHashOfAllFiles to fail with 'part not found after FREEZE'. In ClickHouse 25.x+, both cache and underlying disks report type='ObjectStorage', so d.type cannot distinguish them. Instead, use the cache_path column: cache disks have non-empty cache_path, underlying disks have empty cache_path. Replace any(d.name) with argMin(d.name, d.cache_path != '') when the cache_path column exists. argMin picks the name with the minimum sort value, preferring the non-cache disk (0) over the cache wrapper (1). The fix is gated behind a column existence check for backward compatibility with older ClickHouse versions.
Slach
requested changes
May 26, 2026
Collaborator
Slach
left a comment
There was a problem hiding this comment.
could you add e2e test in test/integration/ ?
Adds TestCacheDiskBackup which verifies that backup and restore work correctly when a ClickHouse cache disk wraps an S3 object disk. The test: 1. Configures ClickHouse with s3_disk + s3_cache (cache wrapper) + s3_tiered policy 2. Creates a table with TTL to move old data to S3 cold volume 3. Runs backup — this is where the bug manifested (part not found after FREEZE) 4. Uploads, downloads, and restores the backup 5. Verifies data integrity after restore Covers the fix in getDisksFromSystemDisks that uses argMin(d.name, d.cache_path != '') to prefer the underlying disk name over the cache wrapper.
Author
|
Thanks for the review! I've added an e2e integration test in The test:
Requires ClickHouse >= 22.3 for cache disk support. Compiles and passes @Slach please review |
…ix/cache-disk-name-resolution
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: prefer underlying disk name over cache wrapper in getDisksFromSystemDisks
Problem
When a ClickHouse cache disk wraps an S3 object disk, both disks share the same
path(metadata path) insystem.disks. TheGROUP BY d.pathclause ingetDisksFromSystemDisks()collapses them into one row, andany(d.name)mayarbitrarily pick the cache disk name (e.g.,
s3_cache) instead of theunderlying disk name (e.g.,
s3_disk).This causes
fetchHashOfAllFilesto querysystem.partswithdisk_name = 's3_cache', butsystem.partsalways reports the underlying diskname (
s3_disk), resulting in:The bug is non-deterministic —
any()may pick either name depending onthe query execution order.
Why
d.typeis insufficientIn ClickHouse 25.x+, both cache and underlying disks report
type = 'ObjectStorage'insystem.disks. The oldertype = 'Cache'classification is no longer used, so type-based filtering cannot distinguish
them.
Fix
Use the
cache_pathcolumn fromsystem.disksto reliably distinguish cachedisks from underlying disks:
cache_path(e.g.,
/var/lib/clickhouse/filesystem_caches/s3_cache/)cache_pathReplace
any(d.name)withargMin(d.name, d.cache_path != ''):cache_path != ''evaluates to0(false)cache_path != ''evaluates to1(true)argMinpicks the name with the minimum sort value, i.e., thenon-cache disk name (
0 < 1)The fix also detects whether the
cache_pathcolumn exists insystem.disks(for backward compatibility with older ClickHouse versions) before using it.
Changes
pkg/clickhouse/clickhouse.go:CachePathPresentfield toDiskFieldsstructcache_pathcolumn insystem.columnsany(d.name)toargMin(d.name, d.cache_path != '')whencache_pathcolumn existsEvidence
system.disksshowing the shared path problemBefore fix (v2.7.0)
After fix
Successfully backed up a database with multiple tables (TiB-scale S3 object disk
Backward Compatibility
cache_pathcolumn detection is gated behind a check insystem.columns, so older ClickHouse versions without this column willcontinue to use
any(d.name)as before.argMin(d.name, d.cache_path != '')degrades to picking an arbitrary name (all have
cache_path = ''), which isequivalent to the current
any(d.name)behavior.Testing
(local disk + S3 object disk + S3 cache disk)
completed with zero errors
Fixes #1396