-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Description
A race condition exists in dbsync that can block the synchronization process, especially with large projects that take a long time to download. When dbsync initiates a pull operation, and another client pushes a new version to the Mergin Maps server before the pull is complete, dbsync ends up with an outdated local version of the project.
This leads to a failure in the subsequent push operation, because of a strict version check that ensures the local version matches the server version. The push function raises an error: "There are pending changes on server - need to pull them first.". This creates a loop where dbsync is stuck trying to pull, but each pull is slow and susceptible to the same race condition, requiring manual intervention like --force-init, which can lead to data loss.
Why --force-init is not a solution
Using --force-init is a heavy-handed approach that wipes the local state and re-initializes the synchronization from scratch. This is not a viable solution in a production environment for several reasons:
- Data Loss: If there are changes in the PostgreSQL database that have not been pushed to the Mergin Maps server, a
--force-initwill wipe thebaseandmodifiedschemas and re-create them from the GeoPackage file. This will cause any changes made in the database to be lost. - Manual Intervention: The need for manual intervention defeats the purpose of an automated synchronization daemon.
- Downtime: The re-initialization process can be time-consuming for large projects, leading to extended downtime for the synchronization service.
The problematic version check is located in the push function in dbsync.py:
# dbsync.py in push()
# ...
# check there are no pending changes on server
if server_version != local_version:
raise DbSyncError("There are pending changes on server - need to pull them first.")Real-world Scenario
- T0:
dbsyncstarts apulloperation for a large project with many photos. The server is at versionv100. The download is expected to take over a minute. - T0 + 30s: A surveyor in the field finishes their work and syncs their mobile client. This creates version
v101on the Mergin Maps server. - T0 + 90s:
dbsynccompletes its download ofv100and applies the changes to the PostgreSQL database. The local project version fordbsyncis nowv100. - T0 + 95s: The
dbsyncdaemon proceeds to thepushstep to sync changes from the database back to Mergin Maps. - Failure: The
pushoperation detects that the server is atv101while the local version isv100. It aborts the push, anddbsyncis effectively blocked.
Proposed Solution
To resolve this, the push function should be made more resilient. Instead of immediately failing upon a version mismatch, it should attempt to resolve the situation automatically by pulling the latest changes.
The proposed solution is to modify the push function in dbsync.py. When a version mismatch is detected, dbsync should:
- Automatically trigger the
pullfunction. The existingpullfunction is capable of handling a rebase of local database changes on top of the incoming server changes. - After the
pullis complete, re-check the version. - If the versions now match, proceed with the
pushoperation. - If the versions still do not match after the automatic pull, then raise an error, as this would indicate a more serious problem that requires manual intervention.
This "pull-and-retry" mechanism would make the synchronization process more robust for projects with long download times and active collaboration, avoiding the need for manual resets.