Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,53 @@ paxos_variant: v2
paxos_state_purging: repaired
```

!!! note "`paxos_variant` and `paxos_state_purging` Are Independent"
These two settings do not depend on each other. You can enable Paxos v2 without changing `paxos_state_purging`, or set `paxos_state_purging: repaired` with Paxos v1. However, the recommended production configuration for LWT-heavy clusters is `v2` + `repaired`, which together enable the [commit consistency optimization](#commit-consistency-optimization) below.

For detailed configuration options, see [Paxos-Related cassandra.yaml Configuration](../../operations/repair/strategies.md#paxos-related-cassandrayaml-configuration).

### Paxos State Purging

The `paxos_state_purging` setting controls how old entries in the `system.paxos` table are cleaned up:

| Value | Mechanism | Safe with Commit CL=ANY | Revert Path |
|-------|-----------|------------------------|-------------|
| `legacy` | TTL-based expiration | **No** — committed values may expire before propagation | N/A (default) |
| `gc_grace` | Compaction-time expiry based on `gc_grace_seconds`, no TTLs | **No** | Safe fallback from `repaired` |
| `repaired` | Purged only after Paxos repair low bound confirms quorum persistence | **Yes** | **MUST** revert to `gc_grace`, **NOT** `legacy` |

With `repaired`, Cassandra uses the low bound recorded in `system.paxos_repair_history` to determine which `system.paxos` entries can be safely purged during compaction. This low bound is only advanced by **coordinated Paxos repairs** (`nodetool repair --paxos-only` or regular `nodetool repair`), not by the automatic background Paxos repair. See [Understanding the Two Paxos Repair Mechanisms](../../operations/repair/strategies.md#understanding-the-two-paxos-repair-mechanisms) for the full distinction.

### Commit Consistency Optimization

LWT operations in Cassandra use two consistency levels:

- **Serial consistency level** (`SERIAL` or `LOCAL_SERIAL`): Controls the Paxos consensus phase — how many replicas must participate in the prepare/propose/accept rounds.
- **Commit (non-serial) consistency level**: Controls the final commit phase — how many replicas must acknowledge that the committed value has been written to the base table.

These are configured separately in application code. For example, a query might use `LOCAL_SERIAL` for consensus and `LOCAL_QUORUM` for the commit.

With Paxos v2 and `paxos_state_purging: repaired`, the commit consistency level can be safely set to `ANY`. This eliminates a WAN round-trip because the coordinator does not need to wait for a quorum acknowledgment of the commit — the Paxos repair mechanism guarantees that committed values will eventually be propagated.

**Prerequisites for commit CL=ANY:**

1. `paxos_variant: v2` set consistently across **all nodes**
2. `paxos_state_purging: repaired` set consistently across **all nodes**
3. Regular coordinated Paxos repairs running (`nodetool repair --paxos-only` or regular `nodetool repair`)

**Example driver configuration (Java):**

```java
// Serial consistency controls the Paxos consensus phase
statement.setSerialConsistencyLevel(ConsistencyLevel.LOCAL_SERIAL);

// Commit consistency controls the final write — can be ANY with v2 + repaired
statement.setConsistencyLevel(ConsistencyLevel.ANY);
```

!!! warning "Reverting Commit CL"
If `paxos_state_purging` must be changed from `repaired` to `gc_grace` (for example, because coordinated Paxos repairs must be disabled for an extended period), applications **MUST** change their commit consistency level back from `ANY` to `QUORUM` or `LOCAL_QUORUM` to maintain correctness.

### Upgrade Considerations

- Clusters with heavy LWT usage **SHOULD** upgrade to Paxos v2
Expand All @@ -376,9 +421,12 @@ For detailed configuration options, see [Paxos-Related cassandra.yaml Configurat
| **Quorum** | Majority of replicas; with RF=3, quorum is 2 |
| **Ballot** | Unique proposal number combining timestamp and node ID |
| **Paxos state** | Entries in `system.paxos` table tracking proposals and accepted values |
| **Paxos repair** | Process of reconciling Paxos state across replicas |
| **Background Paxos repair** | Automatic process (every 5 min in 4.1+) that completes uncommitted Paxos transactions. Does not advance the repair low bound. |
| **Coordinated Paxos repair** | `nodetool repair --paxos-only` or the Paxos step in regular `nodetool repair`. Completes uncommitted transactions AND advances the low bound in `system.paxos_repair_history`, enabling `system.paxos` garbage collection. |
| **Paxos repair low bound** | Ballot recorded in `system.paxos_repair_history` indicating the point up to which Paxos state has been safely reconciled. Used by `paxos_state_purging: repaired` to determine what can be garbage collected. |
| **Serial consistency** | Consistency level (`SERIAL` or `LOCAL_SERIAL`) controlling the Paxos consensus phase |
| **Commit consistency** | Non-serial consistency level controlling the final commit write. Can be set to `ANY` with Paxos v2 + `repaired` purging. |
| **LWT** | Lightweight Transaction—Cassandra's conditional atomic operations using Paxos |
| **SERIAL** | Consistency level that uses Paxos for linearizable operations |

---

Expand All @@ -401,9 +449,9 @@ For detailed configuration options, see [Paxos-Related cassandra.yaml Configurat

### Operational Requirements

- **Paxos state accumulates**: Without regular Paxos repairs, `system.paxos` grows unboundedly
- **Paxos state accumulates**: Without regular **coordinated** Paxos repairs (`nodetool repair --paxos-only` or regular `nodetool repair`), `system.paxos` grows unboundedly when using `paxos_state_purging: repaired`. The automatic background Paxos repair does not advance the low bound needed for garbage collection.
- **Topology changes**: Paxos repairs **MUST** complete before topology changes (bootstrap, decommission)
- **Repair requirements**: Clusters using LWTs **MUST** run regular Paxos repairs
- **Repair requirements**: Clusters using LWTs with `paxos_state_purging: repaired` **MUST** run regular coordinated Paxos repairs

For operational guidance, see [Paxos Repairs](../../operations/repair/strategies.md#paxos-repairs).

Expand Down
13 changes: 10 additions & 3 deletions docs/data-platforms/cassandra/operations/repair/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -702,7 +702,12 @@ Paxos repairs maintain LWT **linearizability** and correctness, especially acros

Paxos repairs are only relevant for **keyspaces that use LWTs**. For keyspaces that never use LWTs, Paxos state does not affect correctness, and operators **MAY** safely skip Paxos repairs for those keyspaces.

In Cassandra 4.1+, Paxos repairs run automatically every 5 minutes by default. Operators **SHOULD** ensure Paxos repairs run regularly on clusters where LWTs are in use. See [Paxos Repairs](strategies.md#paxos-repairs) in the Repair Strategies guide for operational details.
Cassandra 4.1+ provides two distinct Paxos repair mechanisms:

1. **Background Paxos repair** — runs automatically every 5 minutes (configurable). Completes uncommitted Paxos transactions but does **NOT** advance the Paxos repair low bound or enable garbage collection of `system.paxos` data.
2. **Coordinated Paxos repair** — runs via `nodetool repair --paxos-only` or as part of regular `nodetool repair`. Completes uncommitted transactions **AND** advances the low bound in `system.paxos_repair_history`, enabling garbage collection when using `paxos_state_purging: repaired`.

For clusters using `paxos_state_purging: repaired`, operators **MUST** run regular coordinated Paxos repairs. The automatic background repair alone is not sufficient. See [Understanding the Two Paxos Repair Mechanisms](strategies.md#understanding-the-two-paxos-repair-mechanisms) in the Repair Strategies guide for the full distinction.

### Paxos Repairs and Topology Changes

Expand Down Expand Up @@ -760,9 +765,11 @@ Cassandra 4.1+ introduces **Paxos v2**, an updated Paxos implementation for ligh

Paxos v2 is selected via the `paxos_variant` setting in `cassandra.yaml` (values: `v1` or `v2`).

To safely take full advantage of Paxos v2, operators **MUST** ensure:
`paxos_variant` and `paxos_state_purging` are **independent settings** — neither requires the other. However, the recommended production configuration for LWT-heavy clusters is `paxos_variant: v2` combined with `paxos_state_purging: repaired`, which together enable the [commit consistency optimization](../../architecture/distributed-data/paxos.md#commit-consistency-optimization).

To safely take full advantage of Paxos v2 with `repaired` purging, operators **MUST** ensure:

1. **Regular Paxos repairs** are running on all nodes
1. **Regular coordinated Paxos repairs** are running (via `nodetool repair --paxos-only` schedule or regular `nodetool repair`)
2. **Paxos state purging** is configured appropriately (see [Paxos-related cassandra.yaml configuration](strategies.md#paxos-related-cassandrayaml-configuration) in the Repair Strategies guide)

Detailed configuration options and upgrade guidance are covered in the [Repair Strategies](strategies.md) documentation.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -642,27 +642,24 @@ nodetool repair --paxos-only my_keyspace

**How it works:**

Paxos repairs synchronize the Paxos commit log entries stored in `system.paxos` across replicas. This ensures that all nodes agree on the outcome of previous LWT operations, which is essential for maintaining linearizability guarantees.
This command runs a **coordinated Paxos repair** that synchronizes Paxos state stored in `system.paxos` across replicas. Unlike the [automatic background Paxos repair](strategies.md#background-paxos-repair-automatic) (which only completes uncommitted transactions), `--paxos-only` also advances the **Paxos repair low bound** by writing to `system.paxos_repair_history`. This low bound is what enables garbage collection of old `system.paxos` data when using `paxos_state_purging: repaired`.

**When to use:**

- **Pre-4.1 clusters**: Operators **MUST** schedule `--paxos-only` repairs manually (typically hourly) since automatic Paxos repairs are not available
- **Before topology changes**: Run on all nodes before bootstrap, decommission, replace, or move operations to reduce the risk of Paxos cleanup timeouts
- **After disabling automatic Paxos repairs**: If `paxos_repair_enabled` is set to `false`, manual Paxos repairs **MUST** be scheduled regularly for clusters using LWTs
- **Troubleshooting LWT issues**: When LWTs are timing out or behaving unexpectedly
- **Clusters using `paxos_state_purging: repaired`**: Operators **MUST** run `--paxos-only` repairs regularly (typically hourly) or ensure regular full repairs include the Paxos step. The automatic background repair does **NOT** advance the low bound, so without coordinated repairs, `system.paxos` grows unboundedly.
- **Pre-4.1 clusters**: Operators **MUST** schedule `--paxos-only` repairs manually since the automatic background repair is not available.
- **Before topology changes**: Run on all nodes before bootstrap, decommission, replace, or move operations to reduce the risk of Paxos cleanup timeouts.
- **After disabling automatic Paxos repairs**: If `paxos_repair_enabled` is set to `false`, coordinated Paxos repairs **SHOULD** be scheduled regularly for clusters using LWTs.
- **Troubleshooting LWT issues**: When LWTs are timing out or behaving unexpectedly.

**Automatic Paxos repairs (Cassandra 4.1+):**
**Relationship to automatic background Paxos repair (Cassandra 4.1+):**

In Cassandra 4.1 and later, Paxos repairs run automatically every 5 minutes by default when `paxos_repair_enabled` is `true`. Manual `--paxos-only` repairs are typically only needed for:

- Pre-4.1 clusters
- Clusters where automatic Paxos repairs have been disabled
- Proactive cleanup before topology changes
Cassandra 4.1+ includes an automatic background Paxos repair that runs every 5 minutes (controlled by `paxos_repair_enabled`). This background repair completes uncommitted transactions but does **NOT** replace the need for coordinated `--paxos-only` repairs. See [Understanding the Two Paxos Repair Mechanisms](strategies.md#understanding-the-two-paxos-repair-mechanisms) for the full distinction.

**Operational guidance:**

- Running without a keyspace argument repairs Paxos state for **all keyspaces**. This is often **RECOMMENDED** because operators frequently do not know which keyspaces developers are using for LWTs.
- Paxos repairs are lightweight compared to full data repairs and complete quickly
- Paxos repairs are lightweight compared to full data repairs and complete quickly.

For more details on Paxos repair strategy and configuration, see [Paxos Repairs](strategies.md#paxos-repairs) in the Repair Strategies guide.

Expand Down
Loading