Skip to content

Umbrella issue: Free-threading readiness #251

@devdanzin

Description

@devdanzin

This issue tracks free-threading (FT) thread-safety findings from ft-review-toolkit, complementing the ongoing FT work in #234. The analysis combines static analysis (shared state, lock discipline, unsafe APIs, atomic candidates) with dynamic ThreadSanitizer stress testing (18 concurrent scenarios, 4 threads × 200 iterations).

Full report: bitarray_ft_report.md
Migration plan: bitarray_migration_plan.md
TSan report: bitarray_tsan_report.md
TSan stress script: tsan_stress_bitarray.py

Note: Our analysis was initially run against a stale clone that still had PyDict_GetItem in _bitarray.c:2847. The upstream 3.8.1 has already replaced this with PyDict_GetItemRef — thank you Ilan for catching this! All other findings below are confirmed against the current upstream code.

TSan results: 70 raw warnings, 41 unique races, 6 SIGABRTs across 18 stress scenarios. All races are in extension code (0 CPython-internal).


CRITICAL: Lazy-Init Singleton Races (3 locations)

Three static PyObject* variables are lazily initialized with a check-then-write pattern that races under free-threading. Two threads calling the function simultaneously can both see NULL, both import/lookup, and both write — one result leaks, or a thread reads a partially-written pointer.

1. info (BufferInfo class) — _bitarray.c:1021-1025

static PyObject *info = NULL;   /* BufferInfo object */
// ...
if (info == NULL)
    info = bitarray_module_attr("BufferInfo");  // RACE: two threads both see NULL

Called from bitarray_buffer_info(). Fix: Initialize eagerly in PyInit__bitarray(), or remove the cache (bitarray_module_attr does PyImport_ImportModule which already caches):

static PyObject *
bitarray_buffer_info(bitarrayobject *self)
{
    PyObject *info = bitarray_module_attr("BufferInfo");  // no static cache
    if (info == NULL)
        return NULL;
    // ... use info ...
    Py_DECREF(info);
    // ...
}

2. frozen (frozenbitarray class) — _bitarray.c:1097-1093

Same pattern in freeze_if_frozen(). Called during bitarray.__init__() for subclass instances — hot path.

3. reconstructor (pickle helper) — _bitarray.c:1365-1359

Same pattern in bitarray_reduce(). Called during pickling.


CRITICAL: Lazy-Init Lookup Tables with Endianness-Dependent Re-Init (2+1 locations)

4. ssqi() tables — _util.c:277-293

Three static char[256] tables (count_table, sum_table, sum_sqr_table) guarded by static int setup = -1. The guard tracks endianness — tables are re-initialized when a different-endianness bitarray is encountered:

static int setup = -1;      /* endianness of tables */
// ...
if (setup != a->endian) {   // RACE: non-atomic read
    setup_table(count_table, 'c');
    setup_table(sum_table, IS_LE(a) ? 'a' : 'A');
    setup_table(sum_sqr_table, IS_LE(a) ? 's' : 'S');
    setup = a->endian;       // RACE: non-atomic write
}

Two threads calling ssqi() with different-endianness bitarrays will race on both the flag AND the table contents, producing silently wrong computation results.

Note: Simple atomics are NOT sufficient here — the flag guards compound state (three tables). Fix: Pre-compute both endianness variants at module init (tables are small — 6 × 256 bytes = 1.5KB total), or protect with PyMutex:

static PyMutex ssqi_mutex = {0};
PyMutex_Lock(&ssqi_mutex);
if (setup != a->endian) {
    setup_table(count_table, 'c');
    // ...
    setup = a->endian;
}
PyMutex_Unlock(&ssqi_mutex);

5. xor_indices() tables — _util.c:321-334

Same pattern with parity_table and xor_table. Same endianness-dependent re-init race.

6. digit_to_int() table — _util.c:848-864

Similar but endianness-independent (setup is boolean, not endianness). Uses memset to clear the table before populating — a reader during this window gets invalid data. Fix: Initialize at module init in PyInit__util() (table never changes).


HIGH: Zero Per-Object Synchronization (91 functions)

Every function that accesses self->ob_item, self->nbits, self->allocated, or self->ob_exports does so without any lock or critical section. Under free-threading, concurrent access to a shared bitarray object is a data race.

TSan confirmed this across 5 scenarios (concurrent_mutation, read_write_contention, slice_operations, fill_padbits, frombytes_tobytes) — all racing on the array buffer via getbit/setbit and resize().

Key architectural constraint: getbit()/setbit() in bitarray.h are inline leaf functions — they cannot hold locks. All synchronization must be at the Python-facing method level.

Fix: Add Py_BEGIN_CRITICAL_SECTION(self) / Py_END_CRITICAL_SECTION(self) to every bitarray_* method registered in the method table and tp_* slots. The pythoncapi_compat.h header (already included) provides these macros with backward compatibility.

Priority order:

  1. Mutation methods (append, extend, insert, pop, remove, clear, sort, reverse, invert, setall, fill, frombytes) — these call resize() which does PyMem_Realloc on ob_item
  2. Buffer protocol (getbuffer/releasebuffer) — ob_exports counter race
  3. Iterators (bitarrayiter_next, searchiter_next, decodeiter_next) — use Py_BEGIN_CRITICAL_SECTION(it->self)
  4. Two-object operations (bitwise, copy_n, extend_bitarray) — use Py_BEGIN_CRITICAL_SECTION2(self, other)
  5. Read-only accessors (count, all, any, len, repr, tobytes, tolist) — still need protection because concurrent resize() can invalidate ob_item

This is a large but mechanical change. The migration plan at bitarray_ft_migration_plan.md has the full breakdown.


MEDIUM: Structural Migration Items

7. Static type objects (5 types)

Bitarray_Type, DecodeTree_Type, DecodeIter_Type, SearchIter_Type, BitarrayIter_Type are all static PyTypeObject initialized with PyType_Ready(). Under free-threading, CPython may internally mutate tp_dict, tp_subclasses. Convert to heap types via PyType_FromSpec.

Bitarray_Type has Py_TPFLAGS_BASETYPE (subclassable) — highest priority since subclassing triggers tp_subclasses mutations at runtime.

8. CHDI_Type static type — _util.c:2064

Same issue in _util.c.

9. Module state — _bitarray.c:4222, _util.c:2239

Both modules use m_size = -1 (no per-module state). The lazy-init singletons (findings 1-3), interned strings, and bitarray_type pointer should move to a module state struct with multi-phase init (Py_mod_exec).

10. bitarray_type pointer — _util.c:16

static PyTypeObject *bitarray_type written once during PyInit__util, read everywhere. Safe today (import lock serializes init), but should move to module state for subinterpreter correctness.


FIXED: PyDict_GetItem in encode loop

_bitarray.c:2847PyDict_GetItem(codedict, symbol) returns borrowed ref.

Already fixed in 3.8.1 — replaced with PyDict_GetItemRef. Thank you!


LOW: PyTuple_GET_ITEM in index error path

_bitarray.c:1256-1257 — borrowed ref from args tuple consumed immediately by PyErr_Format. Low risk since args is call-stack-local.


TSan Stress Test

A stress test script is available that exercises 18 concurrent scenarios:

# Run under TSan-enabled free-threaded Python:
PYTHON_GIL=0 /path/to/tsan-python tsan_stress_bitarray.py 2> tsan_report.txt

Scenarios include: concurrent mutation, read-write contention, bitwise operations, iteration during mutation, slice operations, buffer export, bytereverse, encode/decode, and more.


Suggested Fix Order

  1. Lazy-init singletons (1-3) — remove the caches or use PyMutex. Trivial fixes.
  2. Lazy-init tables (4-6) — pre-compute at module init. Small change, eliminates endianness re-init race.
  3. Per-object critical sections (finding 7) — the bulk of the work. Mechanical but touches ~80 functions.
  4. Structural migration (7-10) — longer-term, for subinterpreter support.

Analysis by ft-review-toolkit. TSan stress testing via labeille. Report reviewed by a human before submission.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions