Skip to content

fix(universaldb): enforce fdb transaction limits#4995

Draft
NathanFlurry wants to merge 1 commit intosqlite-logs/downgrade-fresh-db-missfrom
sqlite-limits/enforce-fdb-tx-limits
Draft

fix(universaldb): enforce fdb transaction limits#4995
NathanFlurry wants to merge 1 commit intosqlite-logs/downgrade-fresh-db-missfrom
sqlite-limits/enforce-fdb-tx-limits

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

@NathanFlurry NathanFlurry commented May 7, 2026

Stack Context

This stack fixes SQLite oversized transaction handling by making UniversalDB enforce the same core transaction payload limits expected from FoundationDB-backed commits.

What?

  • Enforces a 10,000,000 byte transaction limit in UDB commit paths.
  • Enforces 10,000 byte keys and 100,000 byte values or atomic params.
  • Maps oversized UDB commits to Depot SQLite CommitTooLarge errors.
  • Updates focused UDB and SQLite VFS tests.

Why?

Local RocksDB-backed UDB previously allowed SQLite transactions that production FDB would reject with transaction_too_large. That meant local tests could pass while production failed later as an internal SQLite/FDB commit error.

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 7, 2026

🚅 Deployed to the rivet-pr-4995 environment in rivet-frontend

Service Status Web Updated (UTC)
mcp-hub ❌ Build Failed (View Logs) Web May 7, 2026 at 8:24 am
website 😴 Sleeping (View Logs) Web May 7, 2026 at 8:02 am
frontend-cloud ❌ Build Failed (View Logs) Web May 7, 2026 at 7:41 am
ladle ❌ Build Failed (View Logs) Web May 7, 2026 at 7:41 am
frontend-inspector ❌ Build Failed (View Logs) Web May 7, 2026 at 7:40 am
kitchen-sink ❌ Build Failed (View Logs) Web May 7, 2026 at 7:40 am

Copy link
Copy Markdown
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Code Review

Overview

This PR enforces FDB-equivalent transaction limits (10 MB total, 10 KB keys, 100 KB values) in the RocksDB and Postgres UDB drivers, which previously had no such enforcement. It also maps TransactionTooLarge through the depot SQLite commit path to a proper CommitTooLarge error so the VFS can surface it correctly. The motivation is solid — local tests were silently passing oversized transactions that would fail in production FDB.


Issues

committed.store(true) fires before validate_commit_limits in both drivers

In both rocksdb/transaction.rs and postgres/transaction.rs, the sequence is:

self.committed.store(true, Ordering::SeqCst);   // ← marks as committed
let (operations, conflict_ranges) = self.operations.consume();
validate_commit_limits(&operations, &conflict_ranges)?;  // ← may fail here

If validate_commit_limits returns an error, the transaction is permanently flagged as committed but nothing was written. The operations have also been consumed, so they can't be replayed. For TransactionTooLarge this is intentional (it's non-retryable), but a caller relying on the committed flag to know whether data landed will see true even for a failed-validation no-op. The pre-existing await? failures below the validation call had the same ordering, but adding a new failure point above them makes this more likely to matter in practice. At minimum this deserves a comment clarifying the intent.

map_udb_commit_error only maps TransactionTooLarge, not KeyTooLarge/ValueTooLarge

In depot/src/conveyer/commit/apply.rs, map_udb_commit_error only matches DatabaseError::TransactionTooLarge. The two new error variants KeyTooLarge and ValueTooLarge are not mapped. If a SQLite actor writes a key or value that exceeds the per-key/per-value limits, it will propagate as an opaque anyhow::Error rather than a CommitTooLarge, so the VFS won't classify it correctly and callers won't get a clean diagnostic. Either map all three variants or document why key/value errors can't reach this path.


Observations and Minor Points

add_size uses saturating_add — functionally correct but semantically surprising

fn add_size(size: &mut usize, amount: usize) -> Result<()> {
    *size = size.saturating_add(amount);
    if *size > MAX_TRANSACTION_SIZE { ... }

saturating_add caps at usize::MAX on overflow rather than wrapping, so the > MAX_TRANSACTION_SIZE check will still fire. It is correct, but checked_add with an explicit overflow fallback (or just wrapping_add + the check) reads as clearer intent. The silent saturation could confuse a future reader.

Test function names still carry the bench_ prefix

The renamed tests (bench_large_tx_insert_10mb_rejects_transaction_too_large, etc.) assert rejection rather than benchmarking throughput. The bench_ prefix will make cargo test bench or any benchmark harness try to treat them as benchmarks. Consider dropping the prefix entirely: large_tx_insert_10mb_is_rejected or similar.

CommitTooLargeTransport::failed_dirty_pages uses store, not fetch_max

If multiple commits raced concurrently the stored count would be last-write-wins rather than the triggering value. This is fine for the current single-threaded test, but worth noting in case the transport is reused in a concurrent context.

error_is_transaction_too_large is now properly implemented

The previous stub (false) made the gasoline KV compaction shrink-on-error path dead for RocksDB/Postgres backends. The new implementation correctly walks the error chain. Make sure the gasoline pull_next_signals_tx retry path is exercised by the new integration tests or at least that someone has validated its behaviour with a RocksDB backend.


Test Coverage

The new depot_commit_too_large_response_reproduces_dead_vfs_chain test is well-structured and directly validates the VFS poisoning path. The large_tx_insert_rejects_transaction_too_large helper correctly asserts the I/O error surface and the VFS last-error message. Integration tests for all three limit types (KeyTooLarge, ValueTooLarge, TransactionTooLarge) are present in universaldb/tests/integration.rs. Good coverage overall — the only gap is that KeyTooLarge and ValueTooLarge aren't tested through the depot/VFS path (see the mapping issue above).


Summary

The core logic is correct and the motivation is important — production/local parity on transaction limits is a real gap. The main actionable items before landing:

  1. Map KeyTooLarge and ValueTooLarge in map_udb_commit_error (or document why they're excluded).
  2. Drop or rename the bench_ prefix on the rejection tests.
  3. Consider adding a short comment in the commit paths explaining why committed = true is set before the validation that can fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant