-
Notifications
You must be signed in to change notification settings - Fork 245
Block Prunning #2984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Block Prunning #2984
Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @pthmas's task in 2m 5s —— View job Comprehensive Code ReviewI've completed a detailed review of the block pruning implementation. Overall, this is a well-structured implementation with good test coverage. Below are my findings organized by category, with critical issues highlighted first. Critical Issues1. Race Condition Risk in PruneBlocks (store.go:283-354)Severity: High The Issue:
Recommendation: // PruneBlocks removes block data (header, data, signature, and hash index)
// up to and including the given height from the store.
//
// CONCURRENCY: This method assumes exclusive access to the store during pruning.
// Callers must ensure no concurrent reads/writes occur for heights being pruned.High-Priority Issues2. Silent Failure in Best-Effort Pruning (executor.go:443-446)Severity: High The PR description states: "If pruning fails, we log the error but do not fail the block." This is a double-edged sword: Concerns:
Recommendation:
if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil {
e.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
e.metrics.PruningFailures.Inc() // Add this metric
// Consider: disable pruning after N consecutive failures?
}3. Missing Validation for Pruning Configuration (config.go:251-292)Severity: Medium-High The Missing Validations:
Recommendation: // Validate pruning configuration
if c.Node.PruningEnabled {
if c.Node.PruningInterval == 0 {
return fmt.Errorf("pruning_interval must be >= 1 when pruning is enabled")
}
if c.Node.PruningKeepRecent == 0 {
return fmt.Errorf("pruning_keep_recent must be > 0 when pruning is enabled (or disable pruning)")
}
if c.Node.PruningKeepRecent < 10 {
return fmt.Errorf("pruning_keep_recent must be >= 10 to avoid premature data loss")
}
}4. Incomplete State Pruning Documentation (types.go:85-94)Severity: Medium The Issue:
Recommendation:
Medium-Priority Issues5. Error Handling Inconsistency in PruneBlocks (store.go:324-341)Severity: Medium Error handling for batch deletions is inconsistent: if err := batch.Delete(ctx, ds.NewKey(getDataKey(h))); err != nil {
if !errors.Is(err, ds.ErrNotFound) { // Ignores NotFound
return fmt.Errorf("failed to delete data at height %d during pruning: %w", h, err)
}
}Issue:
Recommendation: // Get header blob to compute the hash index key. If header is already
// missing (e.g. due to previous partial pruning), just skip this height.
headerBlob, err := s.db.Get(ctx, ds.NewKey(getHeaderKey(h)))
if err != nil {
if errors.Is(err, ds.ErrNotFound) {
continue // Already pruned or never existed
}
return fmt.Errorf("failed to get header at height %d during pruning: %w", h, err)
}6. Synchronous Pruning Adds Latency to Block Production (executor.go:433-448)Severity: Medium The PR description acknowledges this: "PruneBlocks is called synchronously in produceBlock. On pruning heights, block production will incur extra latency." Quantification:
Recommendation:
// Limit pruning batch size to avoid blocking too long
const maxPrunePerInterval = 100
if newHeight > e.config.Node.PruningKeepRecent {
targetHeight := newHeight - e.config.Node.PruningKeepRecent
// Get last pruned height
lastPruned := e.getLastPrunedHeight(ctx)
// Limit pruning to avoid long blocking
if targetHeight - lastPruned > maxPrunePerInterval {
targetHeight = lastPruned + maxPrunePerInterval
}
if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil {
e.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
}
}7. Missing Test for Partial Pruning Failures (store_test.go)Severity: Medium The test
Recommendation: func TestPruneBlocks_PartialFailure(t *testing.T) {
// Test that pruning is idempotent when partially completed
// Simulate failure after some deletions but before commit
}Low-Priority Issues8. Magic Number for Height Length (store.go:356)Severity: Low const heightLength = 8This is fine, but could benefit from a comment explaining why 8 (uint64 encoding). Recommendation: const heightLength = 8 // size of uint64 in bytes9. Typo in PR TitleSeverity: Low "Block Prunning" should be "Block Pruning" (single 'n') Code Quality & ArchitecturePositive Aspects ✅
Architecture ConcernsMonotonic Pruning Assumption: Performance & Resource EfficiencyBatch Operation EfficiencyThe implementation batches all deletions for a height range, which is good. However: Observation:
Recommendation: const maxPruneBlocksPerBatch = 1000
for h := lastPruned + 1; h <= height; h++ {
// ... deletion logic ...
// Commit every N blocks to avoid huge batches
if (h - lastPruned) % maxPruneBlocksPerBatch == 0 {
if err := batch.Commit(ctx); err != nil {
return fmt.Errorf("failed to commit pruning batch: %w", err)
}
// Create new batch for next chunk
batch, err = s.db.Batch(ctx)
if err != nil {
return fmt.Errorf("failed to create new batch: %w", err)
}
}
}SecurityNo Critical Security Issues Found ✅The pruning implementation doesn't introduce security vulnerabilities:
Minor Security NoteThe silent failure mode (best-effort pruning) could theoretically be exploited if an attacker can cause persistent pruning failures, leading to disk exhaustion DoS. However, this would require the attacker to already have compromised the datastore layer, which is a more fundamental breach. Testing & ReliabilityTest Coverage AssessmentWell-Covered:
Missing Coverage:
Recommendation:
Documentation & Developer ExperienceDocumentation Gaps
Recommendation: ## Block Pruning Configuration
### Choosing pruning_keep_recent
The `pruning_keep_recent` value determines how many recent blocks to keep:
- Minimum: 10 blocks (safety margin)
- Recommended: 1000 blocks for full nodes
- For archive nodes: Disable pruning
Consider:
- DA submission delays (blocks must exist until DA confirmed)
- Peer sync requests (peers may request recent blocks)
- RPC query requirements
### Enabling Pruning on Existing Nodes
1. Stop the node
2. Edit config to set:
- pruning_enabled: true
- pruning_keep_recent: 1000
- pruning_interval: 100
3. Restart node
4. First pruning will happen at next intervalSummary & RecommendationsMust Fix Before Merge
Should Fix Before Merge
Nice to Have
Overall AssessmentThis is a solid implementation of a much-needed feature. The code quality is good, the interface design is clean, and there's reasonable test coverage. The main concerns are:
With the recommended fixes, particularly around validation and metrics, this will be production-ready. Recommendation: Request changes for critical issues (#1-3), then approve once addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we have a lot of different stores and keys. Are we planning to prune as well:
- metadatas?
- go-header store? (there are settings to be looked at)
- execution/evm store (https://github.com/evstack/ev-node/blob/main/execution/evm/store.go)
(and eventually ev-abci store)
It will be inconsistent if we don't allow height pruning of all those stores (at once).
Overview
/h/{height},/d/{height},/c/{height},/i/{hash}up to a target height./s/{height}(state) and/t(current height) untouched.last-pruned-block-heightso pruning is monotonic and idempotent.store.PruneBlockssynchronously.Prunerinterface (PruneBlocks).Config
New node config fields / flags:
--evnode.node.pruning_enabled--evnode.node.pruning_keep_recent--evnode.node.pruning_intervalPruning actually runs only when all three are set to non-trivial values (enabled, keep_recent > 0, interval > 0).
Design trade-offs
Runs in the critical path:
PruneBlocksis called synchronously inproduceBlock. On pruning heights, block production will incur extra latency proportional to the amount of data pruned and underlying storage performance.Best-effort pruning:
If pruning fails, we log the error but do not fail the block. This avoids halting the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.