You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After uploading a backup, metadata.json lists all tables from the original local backup — including tables whose data may have failed to upload. If a single table's upload fails (network error, permission issue, disk full on remote), the backup's metadata.json still references it. On download, clickhouse-backup tries to download that table's data and fails.
Current behavior
Backup create writes local metadata.json listing all tables
Upload uploads metadata.json first, then data files
If table X's data upload fails but the overall upload is retried/resumed, metadata.json still lists table X
Download reads metadata.json, finds table X, tries to download its data — fails
Proposed Fix
After completing all data uploads, reconcile metadata.json with what was actually uploaded. Remove entries for tables whose data files don't exist on remote.
func (b*Backuper) uploadManifest(ctx context.Context, backupNamestring) error {
varmanifest*storage.BackupManifestifb.fileManifest!=nil&&b.fileManifest.TotalFiles>0&&!b.resume {
// Use the incrementally-built manifest (zero Walk / zero ListObjects)manifest=b.fileManifest
} else {
// Fallback: Walk the backup directory for resumed uploadsmanifest=storage.NewBackupManifest(backupName)
err:=b.dst.Walk(ctx, backupName+"/", true, func(ctx context.Context, f storage.RemoteFile) error {
name:=f.Name()
ifname==storage.ManifestFileName||f.Size() ==0 {
returnnil
}
manifest.AddFile(name, f.Size(), f.LastModified())
returnnil
})
iferr!=nil {
returnerrors.WithMessage(err, "manifest Walk")
}
}
returnb.dst.UploadManifest(ctx, backupName, manifest)
}
The incremental approach is even better — build the manifest during upload by recording each file as it's successfully uploaded:
This ensures the manifest (and by extension, the metadata) only references files that actually exist on remote. Thread-safe via mutex since multiple upload goroutines call this concurrently.
Benefits
Downloads never fail due to phantom table references in metadata
For fresh (non-resume) uploads, the manifest is built with zero ListObjects calls
For resumed uploads, the Walk fallback accurately captures what exists on remote
Problem
After uploading a backup,
metadata.jsonlists all tables from the original local backup — including tables whose data may have failed to upload. If a single table's upload fails (network error, permission issue, disk full on remote), the backup'smetadata.jsonstill references it. On download, clickhouse-backup tries to download that table's data and fails.Current behavior
metadata.jsonlisting all tablesmetadata.jsonfirst, then data filesmetadata.jsonstill lists table Xmetadata.json, finds table X, tries to download its data — failsProposed Fix
After completing all data uploads, reconcile
metadata.jsonwith what was actually uploaded. Remove entries for tables whose data files don't exist on remote.The incremental approach is even better — build the manifest during upload by recording each file as it's successfully uploaded:
This ensures the manifest (and by extension, the metadata) only references files that actually exist on remote. Thread-safe via mutex since multiple upload goroutines call this concurrently.
Benefits