Skip to content

fix(threads): replace global mutex with per-thread MEOS initialization#111

Open
estebanzimanyi wants to merge 1 commit intomainfrom
fix/per-thread-meos-init
Open

fix(threads): replace global mutex with per-thread MEOS initialization#111
estebanzimanyi wants to merge 1 commit intomainfrom
fix/per-thread-meos-init

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

@estebanzimanyi estebanzimanyi commented May 7, 2026

👀 Reviewers: tier ranking, dependency chains and the standards checklist live in doc/contributing/reviewer-guide.md.

Summary

  • Replaces the global std::mutex (from the now-closed PR fix error for initializing Geos #51) with a thread_local bool guard in EnsureMeosInitializedOnThread()
  • Each DuckDB worker thread calls meos_initialize() + meos_initialize_error_handler() exactly once on first use
  • Cost: one bool check per scalar-function call — negligible; full DuckDB parallelism is preserved

Root cause

MEOS PR #815 made all mutable MEOS state thread-local (MEOS_TLS): the PROJ context, GSL RNGs, error code, and caches. The error handler is the only process-global piece.

The previous initialization used std::call_once, which fires exactly once — on whichever thread first loads the extension. DuckDB's other worker threads never called meos_initialize(). With MEOS's TLS design, those threads hit NULL TLS pointers on any MEOS call → segfault on parallel aggregate queries.

The global mutex in PR #51 serialized every MEOS call through one thread (destroying DuckDB's parallelism) but still did not call meos_initialize() on worker threads — mutex ensures exclusion, not initialization.

Test plan

  • Build succeeds with no warnings
  • tgeompointSeq(list(TGEOMPOINT(geom, ts) ORDER BY ts)) with SET threads = 4, 2000 rows / 50 vessels — previously segfaulted, now returns correct results
  • All existing tests pass

Closes #51

estebanzimanyi added a commit that referenced this pull request May 7, 2026
…ctions

Cast functions bypass RegisterSerializedScalarFunction and therefore never
trigger EnsureMeosInitializedOnThread().  If a cast (e.g. VARCHAR→STBOX with
a time component) is the first MEOS call on a thread, the PROJ context TLS
pointer is NULL → SIGSEGV.

Call EnsureMeosInitializedOnThread() once in LoadInternal so the
extension-loading thread has MEOS ready before any cast function can fire.
Worker threads remain covered by the per-scalar-function wrapper.

Fixes the stbox.test SIGSEGV under TZ=UTC that appeared in CI on PR #111.
estebanzimanyi added a commit that referenced this pull request May 10, 2026
Maps the file-level overlaps between commits already on main and the
open consolidate/* parity PRs (#97-#104) plus PR #111 (per-thread MEOS
init).  Each overlap row names the conflicting commit and PR; the
resolution-options section spells out the three viable paths
(rebase, revert+fold, keep both) so the maintainer can pick per PR.

Working artefact — delete once consolidations merge.
@estebanzimanyi
Copy link
Copy Markdown
Member Author

Coordination note — main has incompatible timezone commits

main has two commits that this PR will conflict with on merge:

  • 39921f1 fix(tz): single-timezone model — extension forces both MEOS and DuckDB to Europe/Brussels adds meos_initialize_timezone("Europe/Brussels") + ExtensionHelper::AutoLoadExtension(db, "icu") + DBConfig::SetOptionByName("TimeZone", "Europe/Brussels") to LoadInternal — exactly the block this PR's c237f6c removes.
  • 08a5598 docs(tz): clarify two-timezone reality in comments is comments framing the Brussels override.

Plus: the update_test_expected.py rewrite that landed alongside 39921f1 pinned ~25 test files to +01 Brussels offset; once 9dd765a's timezone-neutral policy is the law of the land those need rewriting (to =/stbox_eq/asText comparisons), not flipped to +00.

Resolution path (maintainer's call — see docs/CONSOLIDATION-PLAN.md on #115): revert 39921f1 and 08a5598 from main either before or as part of this PR's merge, then rewrite the affected +01 test expectations to timezone-neutral form. Some of that rewrite has already happened on PR #112 (a10ef79).

estebanzimanyi added a commit that referenced this pull request May 10, 2026
…unblocks PR #111)

Resolves the conflict between main and PR #111 (fix/per-thread-meos-init)
so PR #111 can merge cleanly.

Reverts on main:
- 39921f1 fix(tz): single-timezone model — drops the AutoLoadExtension(icu) +
  SetOptionByName("TimeZone", ...) block from LoadInternal.
- 08a5598 docs(tz): clarify two-timezone reality in comments.
- 47df9ca fix(tz): set MEOS timezone to Europe/Brussels for deterministic
  temporal I/O — drops meos_initialize_timezone("Europe/Brussels").
- d3d9810 fix(tz): force MEOS UTC timezone at extension load — drops the
  earlier meos_initialize_timezone("UTC") so LoadInternal matches PR #111's
  parent state exactly.

Mechanical follow-up:
- Shift remaining +01 expectations to +00 in 23 test files where surviving
  +01 references remained from earlier Brussels-pinned authoring.  Calendar
  dates unchanged; only the displayed offset shifts.

Result: LoadInternal collapses to ConfigureMeosSridCsvOnce() +
meos_initialize() + error handler.  PR #111 then replaces the call_once
block with EnsureMeosInitializedOnThread() — clean diff, no policy fight.

The +00 shift is interim shape; the long-term direction per PR #111 /
commit 9dd765a is to rewrite each assertion to value-equality
(= tstzspan '...'), accessor (numSpans, startTimestamp), or asText(...)
round-trip — caught by scripts/lint-tz-pinned-tests.py (PR #119).
estebanzimanyi added a commit that referenced this pull request May 10, 2026
Two new docs:
- docs/PR-COORDINATION.md — ecosystem policy: gh pr list is the first
  step before any code change; don't duplicate or conflict with
  in-flight PRs/policies. Cross-ecosystem variant. One PR = one commit
  = one feature consolidation rule with the git commit-tree squash
  recipe.
- docs/CONSOLIDATION-PLAN.md — working artefact mapping the file-level
  overlap between commits already on main and the open consolidate/*
  parity PRs (#97/#98/#99/#100/#102/#103/#104) plus PR #111
  (per-thread MEOS init). Three resolution options per overlap
  (rebase / revert+fold / keep both); maintainer picks per PR.

Pre-emptive policy preventing the duplication and policy-conflict
failures that produced the consolidate/* / main overlap and the
single-timezone / per-thread-MEOS-init clash.
estebanzimanyi added a commit that referenced this pull request May 10, 2026
#116 + #118)

Two cleanups landing together so the rest of the open PR queue can
build and merge.

Part 1 — revert main's single-timezone commits (was #116):
- 39921f1 fix(tz): single-timezone model — drops the AutoLoadExtension(icu)
  + SetOptionByName("TimeZone", ...) block from LoadInternal.
- 08a5598 docs(tz): clarify two-timezone reality in comments.
- 47df9ca fix(tz): set MEOS timezone to Europe/Brussels — drops
  meos_initialize_timezone("Europe/Brussels").
- d3d9810 fix(tz): force MEOS UTC timezone — drops the earlier
  meos_initialize_timezone("UTC") so LoadInternal matches PR #111's
  parent state exactly.
- Shift remaining +01 expectations to +00 in 23 test files (calendar
  dates unchanged; only displayed offset shifts to match TZ=UTC).

Part 2 — sync MEOS API drift (was #118):
- meosType → MeosType sweep (16 files): MEOS catalog header now exposes
  the type as MeosType (PascalCase); MobilityDuck source historically
  used meosType (lowercase).
- t{contains,disjoint,intersects,touches,dwithin}_* arg drop (12 call
  sites): MEOS exports lost their trailing bool restr, bool at_value
  parameters; restriction is now composed at the call site via
  tbool_at_value when needed.

Result: LoadInternal collapses to ConfigureMeosSridCsvOnce() +
meos_initialize() + error handler, exactly matching PR #111's parent
state.  All MobilityDuck source compiles against the current vcpkg MEOS.
PR #111 then replaces the call_once block with
EnsureMeosInitializedOnThread() — clean diff, no policy fight.

The +00 shift is interim shape per the project's timezone-neutral
policy (PR #111 / commit 9dd765a); the longer-term direction is to
rewrite each assertion to value-equality / accessor / asText round-trip
form, caught by scripts/lint-tz-pinned-tests.py (PR #120).

## Closes

- #116 (consolidate: revert main's tz commits and align tests with TZ=UTC)
- #118 (fix: sync MEOS API drift)
estebanzimanyi added a commit that referenced this pull request May 10, 2026
…nsolidates #115 + #119)

Three related artefacts that prevent the same class of failure (parallel
work on the same surface drifting in policy and offset state):

docs/PR-COORDINATION.md — ecosystem policy: gh pr list is the first
step before any code change; minimise PR count by folding into existing
PRs; squash each PR to a single commit before review.  Cross-ecosystem
variant for MobilityDB / JMEOS / PyMEOS / MobilitySpark / MEOS-API.

docs/CONSOLIDATION-PLAN.md — file-level overlap matrix between commits
already on main and the open consolidate/* parity PRs (#97/#98/#99/
#100/#102/#103/#104) plus PR #111 (per-thread MEOS init); three
resolution options per overlap.  Working artefact, delete once
consolidations land.

scripts/lint-tz-pinned-tests.py — flags every line in an
expected-output block that carries a hardcoded UTC offset.  Pre-commit
gate / CI lint.  Today reports 734 hits across 43 files; the lint
makes the timezone-neutral migration trackable.

scripts/parity-audit.py — adds an OUT_OF_SCOPE_NAMES bucket for
function names that are out-of-scope by domain (not by suffix
pattern):
- transform_gk: Gauss-Krüger projection added to MobilityDB for the
  SECONDO platform integration; no equivalent need in MobilityDuck.
- create_trip: BerlinMOD synthetic-trip generator; runs in
  MobilityDB / SECONDO and emits parquet artefacts that MobilityDuck
  consumes — MobilityDuck does not need to host the generator itself.

Pairs naturally with PR #111 / commit 9dd765a's timezone-neutral test
policy: the doc tells contributors what to do, the lint enforces it,
the parity audit reflects realistic coverage now that two domain-
specific names are off the missing list.
Replaces the process-global `std::mutex` (from the now-closed PR #51)
that wrapped every MEOS scalar-function call with a per-thread
`thread_local bool` guard in `EnsureMeosInitializedOnThread()`.

Each DuckDB worker thread calls `meos_initialize()` and
`meos_initialize_error_handler()` exactly once on first use; cost is
one `bool` check per scalar-function call thereafter — negligible.
Full DuckDB parallelism is preserved.

Bundles the matching co-commit refinements (3 commits squashed to 1):

1. **Per-thread MEOS init** (was c237f6c)
   The mutex → thread_local swap.

2. **Cover cast functions** (was 55d4c5a)
   Cast functions bypass `RegisterSerializedScalarFunction` and would
   otherwise leave MEOS uninitialised when a cast is the first MEOS
   call (e.g. `SELECT stbox 'STBOX XT(...)'` before any scalar
   function).  Initialise on the loading thread to cover them.

3. **TZ-neutral test assertions** (was 9dd765a)
   Make `stbox.test` timestamp assertions timezone-neutral so the test
   passes under both local and CI session timezones.

## Root cause

MEOS [PR #815](MobilityDB/MobilityDB#815) made
all mutable MEOS state thread-local (`MEOS_TLS`).  Threads that hadn't
called `meos_initialize()` had NULL TLS pointers and segfaulted.  The
mutex was the conservative workaround; per-thread init is the correct
fix.

Closes #404.
@estebanzimanyi estebanzimanyi force-pushed the fix/per-thread-meos-init branch from 9dd765a to 3fff919 Compare May 10, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant