RFC: restoring replicaset from single instance backup #12039

nshy · 2025-11-18T13:45:41Z

nshy
Nov 18, 2025
Collaborator

Reviewers:

Changelog

v2: Changed solution to update instance_uuid in data files header. Added sections Validity and Known issues. Described backup/restore process in details.
v1: Solution with taking instance_uuid from config

Task

Restore a replicaset from data (snapshot [+ xlogs]) of a single instance.

Currently it is not possible. Each replica will read instance uuid from data and replicaset will not be restored. If instance_uuid is set in box.cfg{} then error is raised if it is different from instance uuid in data headers.

Solution

The idea is straightforward - let's just fix instance uuid in data files header to the instance uuid of the replica begin restored. So backup/restore of replicaset involves next steps.

Limitations

The approach works well in case of synchronous replication. If we backup only single replica in case of asynchronous replication or master-master replication we may miss some data in backup due to broken connectivity or replication conflict. See more #12040 for more details and recipe to backup every replica in replicaset.

Backup

Back up current synchronous replicaset master (see Tarantool backup/restore #12040 for single instance backup).
Store each replica instance uuid in backup metadata (instance uuid is available through box.info.uuid for example).

Backup is done.

Restore

Copy backup data files in each replica working directory.
Fix instance uuid in each data file header to the uuid saved in backup metadata. Data file header is plain text. One need to replace value for header key Instance.

Restoring replicaset from backup is done.

Data file header example:

XLOG
0.13
Version: 3.6.0
Instance: 72b92a69-27e1-403c-8250-b4f4a3db08d3
VClock: {1: 4}
PrevVClock: {}

Validity

We write synchro and raft request to the data files, so the question arises whether it may break the restoring somehow given all the replicas will have same requests in data file.

IPROTO_RAFT_CONFIRM, IPROTO_RAFT_ROLLBACK, IPROTO_RAFT_DEMOTE and IPROTO_RAFT_PROMOTE request are replicated anyway so having them on all replicas is valid.

IPROTO_RAFT is not replicated. We store current term and current vote. It is not anything wrong to have same term and same vote on all replicas.

IPROTO_RAFT_PROMOTE and IPROTO_RAFT are also stored in snapshot file. Again there is nothing wrong for all replicas to have same term, same vote and agree on limbo state.

Known issues

This way we backup local spaces of only one replica and do not backup local spaces of the other replicas at all. Besides we restore local spaces of the that single replica on all the other replicas.

Let's consider Tarantool own local spaces.

`_gc_consumers` space.

Non anonymous replicas issues.

We fail to track correctly xlogs for replicas.

Say we create a backup based on replica A. As _gc_consumers does not keep a record for instance itself then when we restore replica B (non A) we do not have a record for A. So if we start B first, create a snapshot and then start A, it will be unable to start without rebootstrap as now B does not have xlogs for A.

Garbage records.

Same situation as before. Replica B will have a garbage record in gc_consumers for itself. It will stay there forever.

Anonymous replicas issues.

Here have an issue similar to non anonymous replica. Anonymous replica X may connected to replica B, so after restore we have no records for X in B and replica may need to be rebootstrapped.

`__vinyl_deferred_delete` space.

The deferred deletes are correct for vinyl state in backup so I see no issues here.

Alternatives

Take instance uuid from config

Basic case (non anonyomus replicas)

If instance_uuid is set in box.cfg{} then it is used as instance uuid. In this case instance_uuid should present in space _cluster describing replicaset. One can set instance_name instead, in this case the name should be present in space _cluster respectively. If both specified then they should point to the same tuple in cluster.

If on recovery instance uuid from config and instance uuid in the header of last data file are different then we rotatate WAL after recovery, so that new instance uuid get written into WAL header. This way next time we don't need to set instance uuid in config again. This does not required for cluster on Tarantool 3.0 though as we always set instance name in config.

On restoring from backup, data files of an instance can have different instance uuid in header with this changes applied. Thus recovering from data files having different instance uuid in header should be valid case.

It is makes sense add sanity check that instance uuid of next xlog is present in current state of _cluster space during recovery.

Anonymous replica

We cannot check instance uuid/instance name using _cluster space as such replica does not have a record there. At least we can check replicaset name from config against _schema space if they are set, which is true for Tarantool 3.

Vinyl

Vinyl backup includes *.vylog, *.run and *.index files. They are all have XLOG structure and header with instance uuid in particular. Let's ignore them on recovery as they can have different uuid due to backup and restore history.

Shortcomings

If instance uuid is invalid (does not belong to replicaset) we raise error only after full recovery.

locker · 2025-11-19T14:36:11Z

locker
Nov 19, 2025
Maintainer

What about RAFT and SYNCHRO requests that are written to snapshots and WALs? AFAIK they contain a replica id. Is it safe to use snapshots/WALs with the same ids on all replicas?

2 replies

Gerold103 Nov 19, 2025
Collaborator

Good point. Unless I am missing something, one only needs to be careful with the votes. When an instance has voted for somebody in the current term, this data is persisted in WAL and in the snapshot. If we would blindly copy this data to all instances, it would mean they all "voted" for the same node (who was in the value of this "vote" field, a replica ID). OTOH, while looking suspicious, I can't think of any negative consequences. They will have to start a new term anyway, because elections can't be "continued" in the same term after restart.

The other info like synchro queue owner ID, confirmed LSNs and so on seem safe to duplicate everywhere. Even now this info is basically replicated and persisted unchanged.

nshy Dec 3, 2025
Collaborator Author

Added section for approach validity in terms of synchro/raft messages in xlogs and snapshots.

locker · 2025-11-19T14:39:10Z

locker
Nov 19, 2025
Maintainer

I don't like the idea of overriding the instance UUID via config because it's confusing. While troubleshooting, one would raise a question why different files of the same database have different ids. Also, it'll probably complicate the code. I'd prefer to have an external utility that would update ids in the data files.

2 replies

sergepetrenko Nov 20, 2025
Maintainer

I agree. We only need to replace the instance UUID in the file header, it's quite trivial to do so.

This way the tool which collects a replica set backup should also collect uuids and names of all the instances in the replica set (doesn't matter, anonymous or not), and store this info together with the backup. Then during recovery, assuming that the backup tool knows which instance it's recovering (by its name), the tool will replace the uuid in snapshot with the one from the instance_uuid <-> instance_name mapping.

@nshy let's elaborate here what steps such a backup tool should take:
During backup:

collect snapshots/xlogs of one node of the replica set
collect the instance_uuid <-> instance_name mapping of all the replica set members, ...

During restore:
...

nshy Nov 21, 2025
Collaborator Author

Done.

Tarantool

RFC: restoring replicaset from single instance backup #12039

Uh oh!

Uh oh!

nshy Nov 18, 2025 Collaborator

Reviewers:

Changelog

Task

Solution

Limitations

Backup

Restore

Validity

Known issues

_gc_consumers space.

Non anonymous replicas issues.

Anonymous replicas issues.

__vinyl_deferred_delete space.

Alternatives

Basic case (non anonyomus replicas)

Anonymous replica

Vinyl

Shortcomings

Replies: 2 comments · 4 replies

Uh oh!

locker Nov 19, 2025 Maintainer

Uh oh!

Uh oh!

Gerold103 Nov 19, 2025 Collaborator

Uh oh!

nshy Dec 3, 2025 Collaborator Author

Uh oh!

locker Nov 19, 2025 Maintainer

Uh oh!

Uh oh!

sergepetrenko Nov 20, 2025 Maintainer

Uh oh!

nshy Nov 21, 2025 Collaborator Author

nshy
Nov 18, 2025
Collaborator

`_gc_consumers` space.

`__vinyl_deferred_delete` space.

Replies: 2 comments 4 replies

locker
Nov 19, 2025
Maintainer

Gerold103 Nov 19, 2025
Collaborator

nshy Dec 3, 2025
Collaborator Author

locker
Nov 19, 2025
Maintainer

sergepetrenko Nov 20, 2025
Maintainer

nshy Nov 21, 2025
Collaborator Author