Replies: 2 comments 4 replies
-
|
What about RAFT and SYNCHRO requests that are written to snapshots and WALs? AFAIK they contain a replica id. Is it safe to use snapshots/WALs with the same ids on all replicas? |
Beta Was this translation helpful? Give feedback.
2 replies
-
|
I don't like the idea of overriding the instance UUID via config because it's confusing. While troubleshooting, one would raise a question why different files of the same database have different ids. Also, it'll probably complicate the code. I'd prefer to have an external utility that would update ids in the data files. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Reviewers:
Changelog
instance_uuidin data files header. Added sectionsValidityandKnown issues. Described backup/restore process in details.instance_uuidfrom configTask
Restore a replicaset from data (snapshot [+ xlogs]) of a single instance.
Currently it is not possible. Each replica will read instance uuid from data and replicaset will not be restored. If
instance_uuidis set inbox.cfg{}then error is raised if it is different from instance uuid in data headers.Solution
The idea is straightforward - let's just fix instance uuid in data files header to the instance uuid of the replica begin restored. So backup/restore of replicaset involves next steps.
Limitations
The approach works well in case of synchronous replication. If we backup only single replica in case of asynchronous replication or master-master replication we may miss some data in backup due to broken connectivity or replication conflict. See more #12040 for more details and recipe to backup every replica in replicaset.
Backup
box.info.uuidfor example).Backup is done.
Restore
Instance.Restoring replicaset from backup is done.
Data file header example:
Validity
We write synchro and raft request to the data files, so the question arises whether it may break the restoring somehow given all the replicas will have same requests in data file.
IPROTO_RAFT_CONFIRM,IPROTO_RAFT_ROLLBACK,IPROTO_RAFT_DEMOTEandIPROTO_RAFT_PROMOTErequest are replicated anyway so having them on all replicas is valid.IPROTO_RAFTis not replicated. We store current term and current vote. It is not anything wrong to have same term and same vote on all replicas.IPROTO_RAFT_PROMOTEandIPROTO_RAFTare also stored in snapshot file. Again there is nothing wrong for all replicas to have same term, same vote and agree on limbo state.Known issues
This way we backup local spaces of only one replica and do not backup local spaces of the other replicas at all. Besides we restore local spaces of the that single replica on all the other replicas.
Let's consider Tarantool own local spaces.
_gc_consumersspace.Non anonymous replicas issues.
Say we create a backup based on replica
A. As_gc_consumersdoes not keep a record for instance itself then when we restore replicaB(nonA) we do not have a record forA. So if we start B first, create a snapshot and then start A, it will be unable to start without rebootstrap as now B does not have xlogs for A.Same situation as before. Replica
Bwill have a garbage record ingc_consumersfor itself. It will stay there forever.Anonymous replicas issues.
Here have an issue similar to non anonymous replica. Anonymous replica
Xmay connected to replicaB, so after restore we have no records forXinBand replica may need to be rebootstrapped.__vinyl_deferred_deletespace.The deferred deletes are correct for vinyl state in backup so I see no issues here.
Alternatives
Take instance uuid from config
Basic case (non anonyomus replicas)
If
instance_uuidis set inbox.cfg{}then it is used as instance uuid. In this caseinstance_uuidshould present in space_clusterdescribing replicaset. One can setinstance_nameinstead, in this case the name should be present in space_clusterrespectively. If both specified then they should point to the same tuple incluster.If on recovery instance uuid from config and instance uuid in the header of last data file are different then we rotatate WAL after recovery, so that new instance uuid get written into WAL header. This way next time we don't need to set instance uuid in config again. This does not required for cluster on Tarantool 3.0 though as we always set instance name in config.
On restoring from backup, data files of an instance can have different instance uuid in header with this changes applied. Thus recovering from data files having different instance uuid in header should be valid case.
It is makes sense add sanity check that instance uuid of next xlog is present in current state of
_clusterspace during recovery.Anonymous replica
We cannot check instance uuid/instance name using
_clusterspace as such replica does not have a record there. At least we can check replicaset name from config against_schemaspace if they are set, which is true for Tarantool 3.Vinyl
Vinyl backup includes
*.vylog,*.runand*.indexfiles. They are all have XLOG structure and header with instance uuid in particular. Let's ignore them on recovery as they can have different uuid due to backup and restore history.Shortcomings
If instance uuid is invalid (does not belong to replicaset) we raise error only after full recovery.
Beta Was this translation helpful? Give feedback.
All reactions