Skip to content

Drop unused unique index on task_processor_task.uuid (~7 GB in prod) #222

@gagantrivedi

Description

@gagantrivedi

Summary

The task_processor_task and task_processor_recurringtask tables each carry an auto-generated unique B-tree index on their uuid column (task_processor_task_uuid_key, task_processor_recurringtask_uuid_key). Neither index is read by any query in our codebases.

In production, task_processor_task_uuid_key alone is ~7 GB.

Origin

The uuid field was introduced in the very first migration of the task processor — commit c5110873a ("Async processor (#1334)", 2022-08-03), defined as:

uuid = models.UUIDField(unique=True, default=uuid.uuid4)

unique=True triggers Postgres to create the backing unique index. The field has been carried through every relocation since (extraction to flagsmith-task-processor, then port to flagsmith-common) without ever being queried.

Why this matters

task_processor_task is a high-churn table (insert per enqueued task, update on lock/run, delete on cleanup). A unique index on a randomly-generated UUID is one of the more expensive index shapes to maintain — every insert pays for a B-tree write at a random position, every delete pays for a tombstone, and the index never returns the favour with a read. At ~7 GB it's also a non-trivial chunk of buffer cache, backup volume, and replication traffic.

Primary key is unaffected

Both tables keep Django's default auto-increment id PK. Every existing query already uses it:

  • Task.objects.filter(pk__in=…) (tasks.py:51)
  • TaskRun / RecurringTaskRun FKs target task_id
  • The get_tasks_to_process() SQL function selects by id ordering

Dropping uuid (or just unique=True) leaves all of that intact.

Proposed change

Drop unique=True from AbstractBaseTask.uuid (or remove the field outright, pending a check on external consumers — e.g. log/metric pipelines that may emit task.uuid). Either change is a one-migration cleanup; in prod, follow with DROP INDEX CONCURRENTLY to reclaim the 7 GB without locking the table.

Verification done

  • No .filter(uuid=…) / .get(uuid=…) / task__uuid / raw-SQL reference anywhere.
  • The only uuid lookup in task_processor is on the unrelated HealthCheckModel, which has its own index.
  • RecurringTaskAdmin.list_display renders uuid but does not filter by it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions