Proposal: eliminate S3 Walk (ListObjectsV2) during restore via upload-time file manifest

## Problem

When restoring (downloading) a backup from S3-compatible object storage, `clickhouse-backup` calls `Walk()` (which maps to `ListObjectsV2` with pagination) for **every single part** in the backup. For a production backup with tens of thousands of parts, this creates an enormous number of S3 API calls that dominate restore time.

### How Walk is used during download

The `DownloadPath()` and `DownloadPathParallel()` functions in `pkg/storage/general.go` both call `bd.Walk()` before downloading any files:

```go
// general.go line 506
walkErr := bd.Walk(ctx, remotePath, true, func(ctx context.Context, f RemoteFile) error {
    // ... download each file
})
```

Under the hood, `Walk()` calls `ListObjectsV2` with pagination (1000 keys per page) via `remotePager()`:

```go
// s3.go line 851
params := &s3.ListObjectsV2Input{
    Bucket:  aws.String(s.Config.Bucket),
    MaxKeys: aws.Int32(1000),
    Prefix:  aws.String(prefix),
}
pager := s3.NewListObjectsV2Paginator(s.client, params, ...)
```

### Impact at scale

For a typical production ClickHouse backup:
- **50,000,000 parts** across all tables
- Each part has 5-20 files (data columns, mark files, primary index, etc.)
- Each `Walk()` call requires at least 1 `ListObjectsV2` request (more if the part has >1000 files, though rare)
- That's **50,000,000+ ListObjectsV2 API calls** just to discover which files to download

At typical S3 latency of 50-100ms per ListObjects request:
- **~40-80 minutes** spent purely on Walk/ListObjects, before any actual data transfer begins
- This is sequential per-part since each part's Walk is independent

The Walk overhead is especially painful for:
1. **Resume operations** — even when all files are already downloaded, Walk is still called to check what exists remotely
2. **Partial restores** (`--table` flag) — Walk is called for filtered parts that will actually be downloaded, but the per-part overhead is the same
3. **Incremental backups** — recursive downloads multiply the Walk overhead

### The data is already known at upload time

The key insight is that **the uploader already knows every file it uploaded**. During `UploadPath()`/`UploadPathParallel()`, each file's path, size, and modification time are available immediately after upload. This information could be recorded during upload and stored alongside the backup, completely eliminating the need for Walk during download.

## Proposed Solution: Upload-time file manifest

### 1. Add `manifest.json` to each backup

During upload, record every file in a manifest:

```json
{
  "version": 1,
  "backup_name": "daily_backup_20260515",
  "created_at": "2026-05-15T23:01:40Z",
  "total_size": 1234567890,
  "total_files": 150000,
  "files": [
    {"path": "shadow/default/my_table/default/part1/data.bin", "size": 104857600, "last_modified": "2026-05-15T23:01:40Z"},
    {"path": "shadow/default/my_table/default/part1/data.cmrk3", "size": 8192, "last_modified": "2026-05-15T23:01:40Z"},
    ...
  ]
}
```

The manifest is built **incrementally during upload** — each successful `PutFile()` call appends an entry. No post-upload Walk needed.

### 2. New download path that uses the manifest

Add `DownloadPathWithManifest()` and `DownloadPathParallelWithManifest()` that iterate over manifest entries instead of calling Walk:

```go
func (bd *BackupDestination) DownloadPathWithManifest(ctx context.Context, remotePath string, localPath string, 
    manifestFiles []ManifestEntry, prefixInManifest string, ...) (int64, error) {
    for _, entry := range manifestFiles {
        // Download file directly — no Walk needed
        r, err := bd.GetFileReader(ctx, path.Join(remotePath, f.Name()))
        // ...
    }
}
```

### 3. Graceful fallback for older backups

If `manifest.json` doesn't exist (backups created before this feature), fall back to the existing Walk-based download. This ensures full backward compatibility.

### Performance improvement

| Metric | Walk-based (current) | Manifest-based (proposed) |
|--------|---------------------|--------------------------|
| ListObjectsV2 calls (50k parts) | ~50,000 | **1** (manifest download) |
| API overhead latency | 40-80 minutes | **<1 second** |
| Resume validation | Requires Walk per part | Local file stat only |
| Upload overhead | None | ~1 small JSON upload |

### Additional benefits

1. **Resume validation without remote calls**: The manifest contains file sizes, enabling local-only validation of downloaded parts (check that all expected files exist with correct sizes) without any remote API calls.

2. **Accurate progress tracking**: Total file count and size are known upfront from the manifest, enabling accurate progress bars and ETAs.

3. **Reduced S3 costs**: ListObjectsV2 is billed per request. Eliminating 50k+ calls per restore saves money at scale.

4. **Works for all storage backends**: The manifest is storage-agnostic — it benefits S3, GCS, Azure Blob, and any backend where listing is expensive.

## Implementation notes

- The manifest should be uploaded as the **last step** after all data files, ensuring it only lists actually-uploaded files.
- For incremental manifest building during upload, use a mutex-protected slice that each upload goroutine appends to.
- Pre-allocate the manifest slice capacity based on the estimated file count (sum of parts × average files per part) to reduce GC pressure.
- Consider pipelining the manifest download — start downloading it concurrently with metadata.json downloads since they're independent.

We've implemented and tested this approach in a fork and confirmed it eliminates Walk overhead entirely for backups with manifests, while gracefully falling back to Walk for older backups without manifests.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: eliminate S3 Walk (ListObjectsV2) during restore via upload-time file manifest #1375

Problem

How Walk is used during download

Impact at scale

The data is already known at upload time

Proposed Solution: Upload-time file manifest

1. Add `manifest.json` to each backup

2. New download path that uses the manifest

3. Graceful fallback for older backups

Performance improvement

Additional benefits

Implementation notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Walk-based (current)	Manifest-based (proposed)
ListObjectsV2 calls (50k parts)	~50,000	1 (manifest download)
API overhead latency	40-80 minutes	<1 second
Resume validation	Requires Walk per part	Local file stat only
Upload overhead	None	~1 small JSON upload

Proposal: eliminate S3 Walk (ListObjectsV2) during restore via upload-time file manifest #1375

Description

Problem

How Walk is used during download

Impact at scale

The data is already known at upload time

Proposed Solution: Upload-time file manifest

1. Add manifest.json to each backup

2. New download path that uses the manifest

3. Graceful fallback for older backups

Performance improvement

Additional benefits

Implementation notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Add `manifest.json` to each backup