Skip to content

RGS sync 404s after the first sync: persisted latest_rgs_snapshot_timestamp isn't aligned to the RGS server's snapshot cadence #201

@vincenzopalazzo

Description

@vincenzopalazzo

Summary

After a successful RGS sync, update_rgs_snapshot() (in ldk-node src/gossip.rs) persists latest_rgs_snapshot_timestamp from the value returned by update_network_graph() — which is the snapshot's internal latest_seen_timestamp (e.g. mid-day UTC). The reference RGS server (https://rapidsync.lightningdevkit.org) only serves snapshots at 24h-aligned timestamps (00:00 UTC). On the next periodic sync, LDK requests /snapshot/v2/<mid-day-ts> and gets HTTP 404, so every subsequent sync fails until the persisted value is wiped.

Effect: a fresh node syncs once (works), then Background sync of RGS gossip data failed: Failed to update gossip data repeats every interval and across all restarts.

Filing here in ldk-server because that's the layer I'm running, but the fix is upstream in ldk-node.

Reproduction

  1. Run ldk-server with rgs_server_url = "https://rapidsync.lightningdevkit.org/snapshot/v2/" (the contrib default).
  2. First sync succeeds.
  3. get_node_info().latest_rgs_snapshot_timestamp is e.g. 1778068800 = 2026-05-06 12:00:00 UTC.
  4. On any subsequent sync (or restart), the failure repeats.

Server behavior, confirmed manually:

for ts in 1778025600 1777939200 1777852800 1778068800; do
  code=$(curl -sS -o /dev/null -w "%{http_code}" "https://rapidsync.lightningdevkit.org/snapshot/v2/$ts")
  echo "ts=$ts ($(date -u -d @$ts '+%F %T')): $code"
done
ts=1778025600 (2026-05-06 00:00:00): 200
ts=1777939200 (2026-05-05 00:00:00): 200
ts=1777852800 (2026-05-04 00:00:00): 200
ts=1778068800 (2026-05-06 12:00:00): 404

Root cause

In ldk-node src/gossip.rs update_rgs_snapshot():

200 => {
    let new_latest_sync_timestamp =
        gossip_sync.update_network_graph(response.as_bytes()).map_err(...)?;
    latest_sync_timestamp.store(new_latest_sync_timestamp, Ordering::Release);
    Ok(new_latest_sync_timestamp)
}

update_network_graph() reads latest_seen_timestamp from the snapshot binary header (rust-lightning lightning-rapid-gossip-sync/src/processing.rs:90) and returns it — that's the most recent gossip-message timestamp inside the snapshot, not the snapshot's URL/cadence timestamp.

Workaround

Delete the node_metrics row from ldk_node_data.sqlite before each daemon start; forces a fresh /snapshot/v2/0. Functional but ugly:

DELETE FROM ldk_node_data WHERE primary_namespace='' AND secondary_namespace='' AND key='node_metrics';

Suggested fix

Round new_latest_sync_timestamp down to the nearest 24h boundary before persisting (matches the reference RGS server's cadence and keeps deltas small). Could be a const or made configurable.

Alternatively, persist the request URL's query_timestamp instead of the snapshot's internal one — but then deltas always start from the previously-requested boundary, which means full re-fetch when starting from 0.

If the canonical contract is that any client-supplied timestamp must be honored, this is a server bug instead — but updating deployed clients is the more tractable fix.

Environment

  • ldk-server @ 50fe752
  • ldk-node 0.8.0+git @ 21eea8c
  • lightning-rapid-gossip-sync 0.3.0+git @ 38a62c3
  • Network: bitcoin (mainnet)
  • Bitcoin backend: [esplora] server_url = "https://mempool.space/api"
  • Linux x86_64

Closest existing ldk-node issue: #615 (different — shutdown race). This one is steady-state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions