Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 32 additions & 6 deletions configuration/source-db/postgres-maintenance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: "Manage Postgres replication slots and WAL lag for reliable PowerSy

Postgres logical replication slots are used to keep track of [replication](/architecture/powersync-service#replication-from-the-source-database) progress (recorded as a [LSN](https://www.postgresql.org/docs/current/datatype-pg-lsn.html)).

Every time a new version of [Sync Streams or Sync Rules](/sync/overview) are deployed, PowerSync creates a new replication slot, then switches over and deletes the old replication slot when the reprocessing of the new Sync Streams/Rules version is done.
Every time a new version of [Sync Streams or Sync Rules](/sync/overview) is deployed, PowerSync creates a new replication slot. Once the new version is fully processed, PowerSync switches to use the new slot and deletes the old one.

The replication slots can be viewed using this query:

Expand All @@ -22,25 +22,51 @@ Example output:
| powersync\_1\_c3c8cf21 | 0/70D8240 | 1 | 56 bytes |
| powersync\_2\_e62d7e0f | 0/70D8240 | 1 | 56 bytes |

In some cases, a replication slot may remain without being used. In this case, the slot prevents Postgres from deleting older WAL entries. One such example is when a PowerSync instance has been deprovisioned.
In some cases, a replication slot may remain without being used. In this case, the slot prevents Postgres from deleting older WAL entries. For example, this happens when a PowerSync instance has been deprovisioned.

While this is desired behavior for slot replication downtime, it could result in excessive disk usage if the slot is not used anymore.
Keeping unused slots alive prevents WAL cleanup, which can lead to excessive disk usage. If a slot is no longer needed, it should be dropped.

Inactive slots can be dropped using:

```bash
select slot_name, pg_drop_replication_slot(slot_name) from pg_replication_slots where active = false;
```

Postgres prevents active slots from being dropped. If it does happen (e.g. while a PowerSync instance is disconnected), PowerSync would automatically re-create the slot, and restart replication.
Postgres prevents active slots from being dropped. If an active slot is somehow dropped while a PowerSync instance is disconnected, PowerSync will automatically recreate the slot when it reconnects and restart replication.

### Recovering from an invalidated slot

A replication slot becomes invalidated when its `wal_status` is `lost`. This happens when the WAL data needed by the slot has been removed, typically because the replication lag exceeded `max_slot_wal_keep_size`.

When this occurs, you will see an error such as:

> Replication slot powersync\_1\_xxxx was invalidated (reason: wal\_removed). Increase max\_slot\_wal\_keep\_size on the source database and delete the existing slot to recover.

To recover:

1. Increase `max_slot_wal_keep_size` on the source Postgres database to prevent re-occurrence. See [Managing and Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing-and-monitoring-replication-lag) for sizing guidance.

2. Drop the invalidated slot:

```sql
SELECT pg_drop_replication_slot('powersync_1_xxxx');
```

Replace `powersync_1_xxxx` with the actual slot name from the error message.

3. Restart the PowerSync Service. It will create a new replication slot and begin replication from scratch.

<Note>If the slot was invalidated during the initial snapshot (before it completed), the PowerSync Service will not automatically retry. You must drop the invalidated slot manually before the service can recover.</Note>

If the invalidation reason is `idle_timeout` (Postgres 18+), the slot was invalidated due to inactivity. In this case, increase `idle_replication_slot_timeout` on the source database instead.

### Maximum Replication Slots

Postgres is configured with a maximum number of replication slots per server. Since each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams/Rules version, the maximum number of PowerSync instances connected to one Postgres server is equal to the maximum number of replication slots, minus 1\.
Postgres is configured with a maximum number of replication slots per server. Each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams or Sync Rules version. The maximum number of PowerSync instances you can connect to one Postgres server is equal to the maximum number of replication slots, minus one.

If other clients are also using replication slots, this number is reduced further.

The maximum number of slots can be configured by setting `max_replication_slots` (not all hosting providers expose this), and checked using:
To configure the maximum number of slots, set `max_replication_slots` (though not all hosting providers expose this setting). Check the current value using:

```sql
select current_setting('max_replication_slots')
Expand Down
27 changes: 27 additions & 0 deletions debugging/error-codes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

This reference documents PowerSync error codes organized by component, with troubleshooting suggestions for developers. Use the search bar to look up specific error codes (e.g., `PSYNC_R0001`).

# PSYNC_Rxxxx: Sync Rules issues

Check warning on line 9 in debugging/error-codes.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

debugging/error-codes.mdx#L9

Did you really mean 'PSYNC_Rxxxx'?

- **PSYNC_R0001**:
Catch-all [Sync Rules](/sync/rules/overview) parsing error, if no more specific error is available
Expand All @@ -23,7 +23,7 @@

## PSYNC_R24xx: SQL security warnings

# PSYNC_Sxxxx: Service issues

Check warning on line 26 in debugging/error-codes.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

debugging/error-codes.mdx#L26

Did you really mean 'PSYNC_Sxxxx'?

- **PSYNC_S0001**:
Internal assertion.
Expand Down Expand Up @@ -62,6 +62,11 @@

This may occur if there is very deep nesting in JSON or embedded documents.

- **PSYNC_S1005**:
Storage version not supported.

This could be caused by a downgrade to a version that does not support the current storage version.

## PSYNC_S11xx: Postgres replication issues

- **PSYNC_S1101**:
Expand Down Expand Up @@ -143,6 +148,17 @@
An alternative is to create explicit policies for the replication role. If you have done that,
you may ignore this warning.

- **PSYNC_S1146**:
Replication slot invalidated.

The replication slot was invalidated by PostgreSQL, typically because WAL retention exceeded `max_slot_wal_keep_size` during a long-running snapshot. Increase `max_slot_wal_keep_size` on the source database and delete the existing replication slot to recover. PowerSync will create a new slot and restart replication automatically.

Other causes: `rows_removed` (catalog rows needed by the slot were removed), `wal_level_insufficient`, or `idle_timeout`.

`idle_timeout` is a PostgreSQL 18+ slot invalidation, in this case increase `idle_replication_slot_timeout` instead of `max_slot_wal_keep_size`.

See [Managing and Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing-and-monitoring-replication-lag) for guidance on sizing `max_slot_wal_keep_size`.

## PSYNC_S12xx: MySQL replication issues

## PSYNC_S13xx: MongoDB replication issues
Expand Down Expand Up @@ -235,6 +251,17 @@
Possible causes:
- Older data has been cleaned up due to exceeding the retention period.

## PSYNC_S16xx: MSSQL replication issues

- **PSYNC_S1601**:
A replicated source table's capture instance has been dropped during a polling cycle.

Possible causes:
- CDC has been disabled for the table.
- The table has been dropped, which also drops the capture instance.

Replication for the table will only resume once CDC has been re-enabled for the table.

## PSYNC_S2xxx: Service API

- **PSYNC_S2001**:
Expand Down
20 changes: 11 additions & 9 deletions maintenance-ops/production-readiness-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ Because PowerSync relies on Postgres logical replication, it's important to cons

The WAL growth rate is expected to increase substantially during the initial replication of large datasets with high update frequency, particularly for tables included in the PowerSync publication.

During normal operation (after Sync Streams (or legacy Sync Rules) are deployed) the WAL growth rate is much smaller than the initial replication period, since the PowerSync Service can replicate ~5k operations per second, meaning the WAL lag is typically in the MB range as opposed to the GB range.
During normal operation (after Sync Streams/Sync Rules are deployed) the WAL growth rate is much smaller than the initial replication period, since the PowerSync Service can replicate ~5k operations per second, meaning the WAL lag is typically in the MB range as opposed to the GB range.

When deciding what to set the `max_slot_wal_keep_size` configuration parameter the following should be taken in account:
1. Database size - This impacts the time it takes to complete the initial replication from the source Postgres database.
Expand All @@ -275,7 +275,7 @@ When deciding what to set the `max_slot_wal_keep_size` configuration parameter t

To view the current replication slots that are being used by PowerSync you can run the following query:

```
```sql
SELECT slot_name,
plugin,
slot_type,
Expand All @@ -285,14 +285,14 @@ FROM pg_replication_slots;
```

To view the current configured value of the `max_slot_wal_keep_size` you can run the following query:
```
SELECT setting as max_slot_wal_keep_size
FROM pg_settings
WHERE name = 'max_slot_wal_keep_size'

```sql
SHOW max_slot_wal_keep_size
```

It's recommended to check the current replication slot lag and `max_slot_wal_keep_size` when deploying Sync Streams/Sync Rules changes to your PowerSync Service instance, especially when you're working with large database volumes.
If you notice that the replication lag is greater than the current `max_slot_wal_keep_size` it's recommended to increase value of the `max_slot_wal_keep_size` on the connected source Postgres database to accommodate for the lag and to ensure the PowerSync Service can complete initial replication without further delays.
If the slot is invalidated mid-snapshot, PowerSync detects the problem and stops replication with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). On the source database, increase `max_slot_wal_keep_size` and delete the existing replication slot. PowerSync creates a new slot and restarts the snapshot.

During a snapshot, PowerSync warns when less than 50% of the WAL budget remains. You may see this warning in the PowerSync dashboard, in the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics) if you self-host, and in PowerSync Service logs. Increase `max_slot_wal_keep_size` or reduce snapshot work before the slot is invalidated. Use the considerations above to set a high enough cap.

### Managing Replication Slots

Expand All @@ -315,10 +315,12 @@ FROM pg_replication_slots where active = false;

The alternative to manually checking for inactive replication slots would be to configure the `idle_replication_slot_timeout` configuration parameter on the source Postgres database.

<Note>The `idle_replication_slot_timeout` [configuration parameter](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUT) is only available from PostgresSQL 18 and above.</Note>
<Note>The `idle_replication_slot_timeout` [configuration parameter](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUT) is only available from Postgres 18 onward.</Note>

The `idle_replication_slot_timeout` will invalidate replication slots that have remained inactive for longer than the value set for the `idle_replication_slot_timeout` parameter.

It's recommended to configure this parameter for source Postgres databases as this will prevent runaway WAL growth for replication slots that are no longer active or used by the PowerSync Service.

For slot mechanics, inactive slot cleanup queries, and the recovery procedure for an invalidated slot, see [Postgres Maintenance](/configuration/source-db/postgres-maintenance).


2 changes: 1 addition & 1 deletion maintenance-ops/replication-lag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@

1. The source database writes the change to its replication stream. The exact mechanism differs per source:
- **Postgres**: logical replication via the Write-Ahead Log (WAL), read through a replication slot.
- **MongoDB**: change streams backed by the oplog.

Check warning on line 16 in maintenance-ops/replication-lag.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

maintenance-ops/replication-lag.mdx#L16

Did you really mean 'oplog'?
- **MySQL**: the binary log (binlog), read using GTIDs.

Check warning on line 17 in maintenance-ops/replication-lag.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

maintenance-ops/replication-lag.mdx#L17

Did you really mean 'GTIDs'?
- **SQL Server**: Change Data Capture (CDC) change tables, populated by a capture job that scans the transaction log.
2. The PowerSync Service reads the change from that stream and processes it into its internal bucket storage.
3. Connected clients receive the change on their next checkpoint.
Expand Down Expand Up @@ -61,7 +61,7 @@
When you first connect a source database, or when you deploy Sync Config changes that trigger reprocessing, the PowerSync Service replicates the full set of matching rows. During this period:

* Replication lag will be elevated until the initial snapshot completes.
* The source-side replication buffer (WAL on Postgres, oplog on MongoDB, binlog on MySQL, CDC change tables on SQL Server) grows because the service has not yet acknowledged those changes.

Check warning on line 64 in maintenance-ops/replication-lag.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

maintenance-ops/replication-lag.mdx#L64

Did you really mean 'oplog'?

This is expected. Plan for it by sizing the relevant retention setting appropriately (see the source-specific sections below) and by coordinating large Sync Config changes during lower-traffic windows.

Expand All @@ -74,7 +74,7 @@

If lag correlates with specific workloads, profile those workloads on the source database before looking at the PowerSync Service.

#### Bursty Write Workloads Exceeding Replication Throughput

Check warning on line 77 in maintenance-ops/replication-lag.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

maintenance-ops/replication-lag.mdx#L77

Did you really mean 'Bursty'?

Replication lag is a function of how fast changes arrive vs. how fast PowerSync can consume them. If a workload produces changes faster than the service can replicate, lag will accumulate until the burst ends and then drain as the service catches up. The service's published throughput (see [Performance and Limits](/resources/performance-and-limits#performance-expectations)) is roughly:

Expand All @@ -86,7 +86,7 @@

* **Scheduled jobs**: cron jobs, nightly batches, or queue workers that flush on a timer. These tend to produce very sharp lag spikes at predictable times.
* **Bulk `UPDATE`s across indexed columns**: a single statement can generate millions of row-change events in the replication stream, even if the SQL itself runs quickly on the source.
* **Backfills and data migrations**: schema changes, column backfills, or re-keying jobs. On Postgres these can also rewrite large portions of a table, multiplying WAL volume.

Check warning on line 89 in maintenance-ops/replication-lag.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

maintenance-ops/replication-lag.mdx#L89

Did you really mean 'Backfills'?

Check warning on line 89 in maintenance-ops/replication-lag.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

maintenance-ops/replication-lag.mdx#L89

Did you really mean 'backfills'?
* **Bulk imports** (`COPY`, `LOAD DATA`, `BULK INSERT`, `insertMany`): import throughput on the source is often far higher than replication throughput.

<Tip>
Expand All @@ -111,7 +111,7 @@
**Supabase defaults**: Supabase projects ship with `max_slot_wal_keep_size = 4GB` and a limit of 5 replication slots. The 4GB cap is easy to exceed during initial replication of a large dataset or a sustained write burst, after which the slot will be invalidated and PowerSync has to restart replication from scratch. Raise this value before connecting a large Supabase database to PowerSync.
</Warning>

See [Managing and Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing-and-monitoring-replication-lag) for queries to check the current setting and the current slot lag, and for guidance on sizing it.
See [Managing and Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing-and-monitoring-replication-lag) for queries to check the current setting and the current slot lag, and for guidance on sizing it. When a slot is invalidated, PowerSync surfaces error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) — see [Recovering from an invalidated slot](/configuration/source-db/postgres-maintenance#recovering-from-an-invalidated-slot) for the recovery procedure.

#### `TRUNCATE` on Replicated Tables

Expand Down Expand Up @@ -152,7 +152,7 @@
### All Sources

* **Confirm the source database is healthy**: check CPU, IO, connection count, and long-running transactions on the source. A saturated source will cause replication lag that no amount of tuning on the PowerSync side can fix.
* **Pause or reduce large writes while the service catches up**: if lag is already elevated, holding off on scheduled jobs, bulk updates, migrations, and backfills is usually the fastest way to let it drain. If a large write is unavoidable, batch it into smaller transactions and pace them so the service has time to drain between batches, rather than running it as one large transaction.

Check warning on line 155 in maintenance-ops/replication-lag.mdx

View check run for this annotation

Mintlify / Mintlify Validation (powersync) - vale-spellcheck

maintenance-ops/replication-lag.mdx#L155

Did you really mean 'backfills'?
* **Review Sync Config**: look for Sync Config changes that could be producing significantly more buckets or heavier parameter queries than before. Simplify where possible and deploy large changes during lower-traffic windows.
* **Check for source schema changes**: `ALTER TABLE` and similar changes on replicated tables can stall or invalidate replication until reconfigured. See [Implementing Schema Changes](/maintenance-ops/implementing-schema-changes) for the recommended flow.
* **Check instance logs for errors**: [Replicator logs](/maintenance-ops/monitoring-and-alerting#instance-logs) often contain the specific error (slot invalidation, change stream failure, binlog purge, CDC retention expiry, source connectivity) behind a lag incident.
Expand Down
54 changes: 42 additions & 12 deletions maintenance-ops/self-hosting/diagnostics.mdx
Original file line number Diff line number Diff line change
@@ -1,43 +1,73 @@
---
title: "Diagnostics"
description: "Use the PowerSync Diagnostics API to inspect replication status and sync health."
description: "How to use the PowerSync Service Diagnostics API to inspect replication status, errors, and slot health."
---

All self-hosted PowerSync Service instances ship with a Diagnostics API.
This API provides the following diagnostic information:

- Connections → Connected backend source database and any active errors associated with the connection.
- Active Sync Streams / Sync Rules → Currently deployed Sync Streams (or legacy Sync Rules) and its status.
All self-hosted PowerSync Service instances ship with a Diagnostics API for inspecting replication state, surfacing errors, and monitoring source database health.

## CLI

If you have the [PowerSync CLI](/tools/cli) installed, use `powersync status` to check instance status without calling the API directly. This works with any running PowerSync instance local or remote.
If you have the [PowerSync CLI](/tools/cli) installed, use `powersync status` to check instance status without calling the API directly. This works with any running PowerSync instance, whether local or remote.

```bash
powersync status

# Extract a specific field
powersync status --output=json | jq '.connections[0]'
powersync status --output=json | jq '.data.active_sync_rules'
```

## Diagnostics API

# Configuration
### Configuration

1. To enable the Diagnostics API, specify an API token in your PowerSync YAML file:
1. Specify an API token in your PowerSync YAML file:

```yaml service.yaml
api:
tokens:
- YOUR_API_TOKEN
```
<Warning>Make sure to use a secure API token as part of this configuration</Warning>

<Warning>Use a secure, randomly generated API token.</Warning>

2. Restart the PowerSync Service.

3. Once configured, send an HTTP request to your PowerSync Service Diagnostics API endpoint. Include the API token set in step 1 as a Bearer token in the Authorization header.
3. Send a POST request to the diagnostics endpoint, passing the token as a Bearer token:

```shell
curl -X POST http://localhost:8080/api/admin/v1/diagnostics \
-H "Authorization: Bearer YOUR_API_TOKEN"
```

### Response structure

The response `data` object contains:

**`connections`** — whether PowerSync can reach the configured source database and any connection-level errors.

**`active_sync_rules`** — the currently serving sync config (Sync Streams or Sync Rules). Contains a `connections[]` array with details about each replication connection including slot name, WAL status, and tables being replicated. Also includes an `errors[]` array for warnings or errors.

**`deploying_sync_rules`** — only present while a new sync config is being deployed and the initial replication is in progress. PowerSync runs this process in parallel so clients continue to be served by the existing active config. Once initial replication completes, this section disappears and `active_sync_rules` updates.

Each connection in `active_sync_rules.connections[]` includes:

| Field | Description |
| --- | --- |
| `slot_name` | The name of the Postgres replication slot used by this sync rules version. |
| `initial_replication_done` | Whether the initial snapshot has completed. |
| `replication_lag_bytes` | Replication lag in bytes. |
| `wal_status` | The WAL status of the replication slot (`reserved`, `extended`, `unreserved`, or `lost`). |
| `safe_wal_size` | Remaining WAL budget in bytes before the slot risks invalidation. |
| `max_slot_wal_keep_size` | The configured `max_slot_wal_keep_size` value on the source Postgres database. |

### Errors and warnings

Warnings and errors appear in the `errors[]` array at the sync rules level (`active_sync_rules.errors[]` or `deploying_sync_rules.errors[]`). This includes:

- **Replication lag warnings** are raised if no replicated commit has been received in more than 5 minutes (warning level) or 15 minutes (fatal level).
- **WAL budget warnings** appear when the remaining WAL budget drops below 50%.
- **Replication errors** such as `PSYNC_S1146` appear when a replication slot is invalidated (when `wal_status` is `lost`).

<Tip>
For guidance on configuring `max_slot_wal_keep_size` and managing replication slots, see [Postgres maintenance](/configuration/source-db/postgres-maintenance).
</Tip>