Skip to content

Race condition during sync of large projects can block dbsync #157

@dracic

Description

@dracic

Description

A race condition exists in dbsync that can block the synchronization process, especially with large projects that take a long time to download. When dbsync initiates a pull operation, and another client pushes a new version to the Mergin Maps server before the pull is complete, dbsync ends up with an outdated local version of the project.

This leads to a failure in the subsequent push operation, because of a strict version check that ensures the local version matches the server version. The push function raises an error: "There are pending changes on server - need to pull them first.". This creates a loop where dbsync is stuck trying to pull, but each pull is slow and susceptible to the same race condition, requiring manual intervention like --force-init, which can lead to data loss.

Why --force-init is not a solution

Using --force-init is a heavy-handed approach that wipes the local state and re-initializes the synchronization from scratch. This is not a viable solution in a production environment for several reasons:

  • Data Loss: If there are changes in the PostgreSQL database that have not been pushed to the Mergin Maps server, a --force-init will wipe the base and modified schemas and re-create them from the GeoPackage file. This will cause any changes made in the database to be lost.
  • Manual Intervention: The need for manual intervention defeats the purpose of an automated synchronization daemon.
  • Downtime: The re-initialization process can be time-consuming for large projects, leading to extended downtime for the synchronization service.

The problematic version check is located in the push function in dbsync.py:

# dbsync.py in push()
# ...
# check there are no pending changes on server
if server_version != local_version:
    raise DbSyncError("There are pending changes on server - need to pull them first.")

Real-world Scenario

  1. T0: dbsync starts a pull operation for a large project with many photos. The server is at version v100. The download is expected to take over a minute.
  2. T0 + 30s: A surveyor in the field finishes their work and syncs their mobile client. This creates version v101 on the Mergin Maps server.
  3. T0 + 90s: dbsync completes its download of v100 and applies the changes to the PostgreSQL database. The local project version for dbsync is now v100.
  4. T0 + 95s: The dbsync daemon proceeds to the push step to sync changes from the database back to Mergin Maps.
  5. Failure: The push operation detects that the server is at v101 while the local version is v100. It aborts the push, and dbsync is effectively blocked.

Proposed Solution

To resolve this, the push function should be made more resilient. Instead of immediately failing upon a version mismatch, it should attempt to resolve the situation automatically by pulling the latest changes.

The proposed solution is to modify the push function in dbsync.py. When a version mismatch is detected, dbsync should:

  1. Automatically trigger the pull function. The existing pull function is capable of handling a rebase of local database changes on top of the incoming server changes.
  2. After the pull is complete, re-check the version.
  3. If the versions now match, proceed with the push operation.
  4. If the versions still do not match after the automatic pull, then raise an error, as this would indicate a more serious problem that requires manual intervention.

This "pull-and-retry" mechanism would make the synchronization process more robust for projects with long download times and active collaboration, avoiding the need for manual resets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions