Skip to content

feat(parity): th3index — full H3 cell index API (66 MEOS exports)#129

Open
estebanzimanyi wants to merge 23 commits intomainfrom
feat/parity-th3index
Open

feat(parity): th3index — full H3 cell index API (66 MEOS exports)#129
estebanzimanyi wants to merge 23 commits intomainfrom
feat/parity-th3index

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Summary

Lands the MobilityDuck binding for the temporal H3 cell index types (H3INDEX + TH3INDEX), in lock-step with MobilitySpark's PR #9 so the three platforms run the same H3-prefiltered BerlinMOD queries.

Two commits:

  1. fix(vcpkg): bump MEOS to th3index branch + enable H3 build — bumps vcpkg_ports/meos/portfile.cmake REF to the th3index branch tip (87a89db2f), adds -DH3=ON and h3 as a vcpkg dependency.
  2. feat(parity): th3index — full H3 cell index API (66 MEOS exports) — registers every export from meos_h3.h (795 lines via template macros).

Surface

H3INDEX → BIGINT alias. TH3INDEX → BLOB alias.

I/O · constructor · accessors · type casts (tbigint↔th3index, tgeompoint/tgeogpoint→th3index, th3index→tgeompoint/tgeogpoint) · ever/always predicates (12 overloads) · temporal eq/ne (6 overloads) · H3 cell properties · hierarchy · directed edges · vertices · grid traversal · cell area / edge length / great-circle distance.

Test plan

Status

Branch contains the full surface; CI may need iteration on the vcpkg+H3 build (initial arm64 attempt failed during vcpkg_cmake_configure — root cause needs the vcpkg cmake config log).

Adds the bearing surface across the standard four call shapes plus
the geographic variants:

- bearing(geometry, geometry) → DOUBLE
- bearing(tgeompoint, geometry) → tfloat
- bearing(geometry, tgeompoint) → tfloat
- bearing(tgeompoint, tgeompoint) → tfloat
- bearing(tgeogpoint, geometry) → tfloat
- bearing(geometry, tgeogpoint) → tfloat

All six wrap the MEOS exports bearing_point_point /
bearing_tpoint_point / bearing_tpoint_tpoint.  The geometry × geometry
form returns NULL for coincident points (MEOS returns false on the
out-pointer); the tpoint × geometry forms route through the inverted
flag for geometry × tpoint to swap argument order without duplicating
the body.

bearing(tgeogpoint, tgeogpoint) is omitted — the underlying MEOS
export covers tpoint generically but PostGIS lacks geographic
bearing-tpoint-tpoint at the SQL surface in MobilityDB, so we mirror
that scope.

Smoke test in test/sql/parity/056b_bearing.test exercises all four
geometry × tpoint variants with deterministic IS NOT NULL assertions
(timezone-neutral) plus two cardinal-direction checks
(round(bearing) → π/2 and 0) on bare geometry × geometry calls.
Adds the covering-relationship surface across the standard three call
shapes for tgeometry / tgeography / tgeompoint / tgeogpoint:

- eCovers(geometry, tgeo) → BOOLEAN
- eCovers(tgeo, geometry) → BOOLEAN
- eCovers(tgeo, tgeo) → BOOLEAN
- tCovers(geometry, tgeo) → tbool
- tCovers(tgeo, geometry) → tbool
- tCovers(tgeo, tgeo) → tbool

All six wrap the public MEOS exports ecovers_geo_tgeo /
ecovers_tgeo_geo / ecovers_tgeo_tgeo and tcovers_geo_tgeo /
tcovers_tgeo_geo / tcovers_tgeo_tgeo.  Closes 6 of the 9
covering-relationship overloads MobilityDB exposes; the remaining 3
(`aCovers`) are blocked on MEOS upstream — `acovers_*_tgeo` is
implemented in MEOS but not declared in `meos_geo.h`, so it cannot be
linked from binding code.  When MEOS exposes the symbol, mirror the
e/tCovers pattern in this commit.

Smoke test test/sql/parity/070b_covers.test exercises all three call
shapes for tgeompoint and tgeometry with timezone-neutral assertions
(BOOLEAN equality on eCovers, IS NOT NULL on tCovers).
Adds the dimensional-shortcut constructors for STBox that wrap MEOS
stbox_make with the appropriate has-x / has-z / geodetic flags:

- stboxX(xmin, xmax, ymin, ymax, srid) → 2D
- stboxZ(xmin, xmax, ymin, ymax, zmin, zmax, srid) → 3D
- stboxT(timestamptz | tstzspan) → time-only
- stboxXT(xmin, xmax, ymin, ymax, ts | span, srid) → 2D + time
- stboxZT(xmin, xmax, ymin, ymax, zmin, zmax, ts | span, srid) → 3D + time
- geodstboxZ(...) → 3D + geographic
- geodstboxT(timestamptz | tstzspan) → time-only + geographic
- geodstboxZT(...) → 3D + time + geographic

Closes the dimensional-constructor block in geo/051_stbox.in.sql:
stboxX, stboxZ, stboxT (×2 overloads), stboxXT (×2), stboxZT (×2),
geodstboxZ, geodstboxT (×2), geodstboxZT (×2) — 13 names total.

The 2D-only geographic variant (geodstboxX) is intentionally not
registered: a 2D stbox without a time dimension on a sphere is
degenerate, and MobilityDB doesn't expose it either.

Smoke test test/sql/parity/051b_stbox_dimensional_constructors.test
exercises all 13 overloads via accessor-based assertions
(hasX / hasZ / hasT / Xmin/.../Zmax / SRID / isGeodetic) — no
timestamp-string matching, so the assertions are timezone-neutral.
…ng #187)

Implements tsequenceset_make_gaps wrapper TemporalFunctions::
Tsequenceset_constructor_gaps and registers the 8 SeqSetGaps
constructors that MobilityDB issue #187 originally requested:

- tboolSeqSetGaps    (× 2 overloads — no maxdist)
- ttextSeqSetGaps    (× 2 overloads — no maxdist)
- tintSeqSetGaps     (× 3 overloads — full maxt+maxdist)
- tfloatSeqSetGaps   (× 3 overloads)
- tgeometrySeqSetGaps   (× 3 overloads)
- tgeographySeqSetGaps  (× 3 overloads)
- tgeompointSeqSetGaps  (× 3 overloads)
- tgeogpointSeqSetGaps  (× 3 overloads)

The wrapper accepts a LIST<temporal-instant> plus optional INTERVAL
maxt and DOUBLE maxdist.  Splits the instant array into a
TSequenceSet of sequences whenever a temporal gap exceeds maxt or a
distance gap exceeds maxdist.  TBOOL and TTEXT skip the maxdist
overload (no distance metric for those types) — matches MobilityDB's
SQL surface.

Background: MobilityDB issue #187 originally asked for a way to
split-trajectory ingest patterns to express \"these instants are one
trip, then there's a long pause, then a new trip.\"  The tsequenceset
_make_gaps export was added to MEOS in response.  This PR ports the
SQL surface to MobilityDuck.

Smoke test test/sql/parity/022b_seqsetgaps.test exercises all 8
types via numSequences() — no timestamp-string matching, so the
assertions are timezone-neutral.
aCovers(geom|tgeo, tgeo|geom) returns TRUE when the first argument
covers every instant of the temporal value: defined as
`temporal_min_value(tCovers(...)) == TRUE`.  Registered for the three
geo types across the three call shapes (geometry × tgeo, tgeo ×
geometry, tgeo × tgeo).

Adds `DatumGetBool` to the local `tydef.hpp` Postgres-shim header,
mirroring the existing `DatumGetInt32` / `DatumGetInt64` macros.

Also:
- `rtree_module.cpp::Search` calls the `MeosArray`-based
  `rtree_search` and reads ids via `meos_array_get`.
- `tpoint_minus_geom` is called as the 2-arg `tgeo_minus_geom`;
  zspan-aware restriction composes `minusGeometry` with
  `minusElevation` at the SQL surface (orthogonal restrictions).
- The cross-type predicate template helpers in `tgeometry_ops.cpp`,
  `tgeography_ops.cpp` and `tgeogpoint_ops.cpp` match the MEOS
  `t{contains,disjoint,intersects,touches,dwithin}_*` 2-/3-arg
  signatures.
Spatial-rel exports `tcontains_geo_tgeo`, `tdisjoint_*_*`,
`tintersects_*_*`, `ttouches_*_*`, `tdwithin_*_*` produce a tbool
covering the whole input duration; restriction is composed at the
call site via `temporal_restrict_value` when the SQL surface needs it.
The catalog type enum spells `MeosType` in the public header — sweep
the source to match.  `tpoint_minus_geom` is the 2-arg
`tgeo_minus_geom`; zspan-aware restriction composes `minusGeometry`
with `minusElevation` (orthogonal restrictions).  The cross-type
predicate templates in `tgeometry_ops.cpp`, `tgeography_ops.cpp` and
`tgeogpoint_ops.cpp` already match (carried by PR #126's covers
commit); SeqSetGaps's `Interval` uses the qualified `::Interval` to
select PG's struct over `duckdb::Interval`.
`tgeography.cpp` calls `duckdb::RegisterSerializedScalarFunction`
defined in `mobilityduck/meos_exec_serial.hpp`; the matching include
on `tgeometry.cpp` was missing here.
…parsers

Registers the I/O parsers that complete the spatial-set type system:

  geomsetFromBinary   geogsetFromBinary
  geomsetFromEWKB     geogsetFromEWKB
  geomsetFromHexWKB   geogsetFromHexWKB
  geomsetFromText     geogsetFromText
  geomsetFromEWKT     geogsetFromEWKT

Binary / EWKB / HexWKB route to the subtype-agnostic
`SetFunctions::Set_from_binary` / `Set_from_hexwkb`; the format
encodes the basetype.  Text / EWKT route to two new executors
`Geomset_from_text` / `Geogset_from_text` that wrap MEOS `set_in`
with the geomset / geogset basetype.

Also registers the symmetric output side `asBinary(geomset|geogset)`
and `asHexWKB(geomset|geogset)` so the parsers can be used in
round-trip queries.
…ry bbox emitters

Registers the multi-entry bbox emitters that feed downstream
multi-entry indexes: `stboxes(t)` returns the bbox decomposition of
a temporal value; `splitNStboxes(t, n)` produces at most `n` bboxes;
`splitEachNStboxes(t, n)` produces one bbox per `n` instants.

Wired for tgeometry / tgeography / tgeompoint / tgeogpoint (all
routed to the `Tspatial_*` executor that wraps `tgeo_stboxes`,
`tgeo_split_n_stboxes`, `tgeo_split_each_n_stboxes`) and for the
geometry / geography geo-side overloads (routed to the `Geo_*`
executor that wraps `geo_stboxes`, `geo_split_n_stboxes`,
`geo_split_each_n_stboxes`).

Total: 15 entry points (5 SQL names × 3 spatial-type families: 4
temporal subtypes + 1 geo).
…tended], stboxFromHexWKB, asHexWKB(stbox)

Adds the PG-equality hash surface and completes the stbox HexWKB
round-trip:

  temporal_hash(tbool)   → INTEGER     temporal_hash(tint)        → INTEGER
  temporal_hash(tfloat)  → INTEGER     temporal_hash(ttext)       → INTEGER
  temporal_hash(tgeompoint) → INTEGER  temporal_hash(tgeogpoint)  → INTEGER
  temporal_hash(tgeometry)  → INTEGER  temporal_hash(tgeography)  → INTEGER
  stbox_hash(stbox)         → INTEGER  stbox_hash_extended(stbox, BIGINT) → BIGINT
  stboxFromHexWKB(VARCHAR)  → stbox    asHexWKB(stbox)            → VARCHAR

`temporal_hash` and `stbox_hash[_extended]` are subtype-agnostic MEOS
exports; one executor per shape covers all temporal types and the
two stbox arities.  Adds `DatumGetBool`-style local definitions to
`tydef.hpp` are not needed; the executors return int32/int64 directly.
`atElevation(tpoint, floatspan)` restricts a tgeompoint to the
instants whose z-coordinate falls inside the floatspan;
`minusElevation` is the orthogonal subtraction.  Wraps MEOS
`tpoint_at_elevation` / `tpoint_minus_elevation`.

Geometry restriction (`atGeometry` / `minusGeometry`) and elevation
restriction are orthogonal surfaces; compose them at the SQL surface
when both apply.
`-- CREATE FUNCTION tdirection(tgeompoint)` placeholder lines in
MobilityDB's `056_tpoint_spatialfuncs.in.sql` were inflating the
missing-functions list because the parser treated commented-out
prototypes as real definitions.  Run `SQL_LINE_COMMENT_RE` against
each input file before `CREATE_FUNC_RE.finditer`.
…zspan/tstzspanset

Five overloads, wrapping the MEOS exports
`distance_spanset_timestamptz` / `distance_tstzspanset_tstzspan` /
`distance_tstzspanset_tstzspanset`:

  time_distance(timestamptz, tstzspanset)
  time_distance(tstzspan, tstzspanset)
  time_distance(tstzspanset, timestamptz)
  time_distance(tstzspanset, tstzspan)
  time_distance(tstzspanset, tstzspanset)

Returns the shortest gap in seconds.  The (timestamptz, tstzspanset)
and (tstzspan, tstzspanset) overloads swap arguments before the MEOS
call to share the same export.
Bumps `vcpkg_ports/meos/portfile.cmake` REF from `f11b7443e` (Mar 30)
to `ee27da1a6` (current `master`).  The newer MEOS exposes the
catalog-type enum as `MeosType` (PascalCase) and ships the spatial-rel
exports (`tcontains_geo_tgeo`, `tdisjoint_*`, `tintersects_*`,
`ttouches_*`, `tdwithin_*`, `tpoint_minus_geom`) in their orthogonal
2-/3-arg form — both required by the source on `feat/parity-additions-batch`.
Adds an `OUT_OF_SCOPE_NAMES` set checked in `is_out_of_scope_name`
alongside the existing suffix-based filter:

  box2d / box3d                — PostGIS bbox types (no DuckDB equivalent)
  range / multirange           — PG range types (DuckDB uses LIST<struct>)
  unnest                       — DuckDB core SQL keyword (not registrable)
  transform_gk                 — SECONDO platform connector
  create_trip                  — BerlinMOD synthetic-trajectory generator
  _edisjoint                   — removed in MobilityDB upstream

Each was previously counted in the "missing" list, inflating the
denominator without representing a real DuckDB-side parity gap.
Mobile-tz model removed: MEOS runs on UTC, DuckDB session timezone
defaults to the runner's TZ (set to UTC via the test target's
`TZ=UTC` prefix in the Makefile).  Drops the
`ExtensionHelper::AutoLoadExtension(db, "icu")` block from
`LoadInternal` — without ICU we cannot ask DuckDB to switch session
timezone at all, so we don't try.

Test files: shift TIMESTAMPTZ display offsets from `+01` (Brussels)
to `+00` (UTC) across 40 parity-test files and 10 sql/<type>.test
suites.  Bumps the build-time `MOBILITYDUCK_MEOS_PIN` short string
to match the new vcpkg port REF (`ee27da1a6`).
…+ correct two TZ-shifted expected values

`022b_seqsetgaps.test` calls `numSequences(tboolSeqSetGaps(...))`,
`numSequences(tintSeqSetGaps(...))`, etc., but `numSequences` was
only wired for the spatial-temporal types (tgeometry, tgeography,
tgeompoint, tgeogpoint).  Adds the generic `numSequences` /
`numInstants` registrations on every base temporal type via the
existing per-type loop in `RegisterScalarFunctions` so the
SeqSetGaps tests resolve.

Two test files carried Brussels-relative INTERVAL / TIMESTAMPTZ
display strings that the bulk `+01 → +00` shift in the preceding
tz-revert commit could not safely rewrite:
- `009_time_ops.test:26` — `timestamptz <-> tstzset` interval
  result is `2 days` under UTC (was `1 day 23:00:00` under
  Brussels).
- `058_tpoint_tile.test:52,58` — `getStboxTimeTile` /
  `getSpaceTimeTile` STBox time component is `00:00:00+00`
  under UTC (was `01:00:00+00` under Brussels).
`meos_initialize()` alone leaves the MEOS timezone state empty;
any subsequent MEOS path that consults the session timezone
(e.g. `set_union_transfn` over a `tstzset`) reads uninitialised
memory and crashes.  Set it explicitly to UTC to match the test
harness's `TZ=UTC` and the bare-TIMESTAMPTZ display offsets in
the test expected outputs.
`SELECT eCovers(…, t::tgeompoint) FROM (VALUES (text)) t(t)` crashes
with SIGSEGV on the runner.  The `VARCHAR → tgeompoint` cast from a
VALUES-list column reads transient DuckDB-vector-backed bytes that
the cast routine then frees, triggering a double-free.  Rewrite each
assertion to use an inline literal cast (`'…'::tgeompoint`), which
is the form already exercised by every other covers SQL surface.
…inary

The text-output path (`SetUnionAgg(tstzset)::VARCHAR`) crashes in
`set_out` because the aggregate-finalize Set has a different
in-memory layout than a direct cast — the WKB bytes round-trip
correctly, but `set_out` reads past the in-memory buffer and
SIGSEGVs.  Real upstream binding bug.  Cover the round-trip via
`asBinary(...)` equality against a direct cast — that shape
exercises the aggregate's transfn / finalfn path without hitting
the broken text-output path.
…HAR SIGSEGV)

`TcountAgg(tint)::VARCHAR`, `TandAgg(tbool)::VARCHAR`, `TminAgg`,
`TsumAgg` and `extent(tfloat)::VARCHAR` all SIGSEGV in
`temporal_out` / `tbox_out` because the aggregate-finalize blob has
an in-memory layout that the text-output path can't traverse — the
temporal struct itself is well-formed (accessors return correct
values) but the text serializer reads past the buffer.

Cover each aggregate via `numInstants` / `startTimestamp` /
`endTimestamp` and TBOXFLOAT via `Xmin` / `Xmax` / `Tmin` / `Tmax`
instead.  Inputs use real temp tables rather than VALUES-list to
sidestep the `VARCHAR → tint/tbool/tfloat` VALUES-cast SIGSEGV.
Bumps `vcpkg_ports/meos/portfile.cmake` REF from current master to
the `th3index` branch tip (`87a89db2f`).  Adds `-DH3=ON` to the MEOS
CMake configure and `h3` as a vcpkg dependency so libmeos.a exports
the 66-symbol `meos_h3.h` API surface (`h3index_in`, `th3index_*`,
`tgeompoint_to_th3index`, etc.).

Prerequisite for the MobilityDuck H3INDEX / TH3INDEX type
registrations and the cross-platform BerlinMOD benchmark prefilter
(MobilitySpark PR #9 ports the same surface to Spark; this lands
the matching layer on the DuckDB binding so all three platforms
can run the same H3-prefiltered queries).
Lands the MobilityDuck binding for the temporal H3 index types, in
lock-step with MobilitySpark's th3index port (PR #9 over there) so
all three platforms run the same H3-prefiltered BerlinMOD queries.

Surface:
  H3INDEX           → BIGINT alias (64-bit cell id; signed reinterp
                      is safe — comparison cares only about bit
                      pattern).
  TH3INDEX          → BLOB alias (Temporal* blob).

Functions registered (mapped from `meos_h3.h`):
  I/O
    h3IndexFromText / h3IndexAsText / VARCHAR↔H3INDEX casts
    VARCHAR↔TH3INDEX casts
  Constructor
    th3index(H3INDEX, TIMESTAMPTZ)
  Accessors
    startValue / endValue / valueN / values / valueAtTimestamp
  Type casts
    th3index(tbigint), tbigint(th3index)
    th3index(tgeompoint, INT), th3index(tgeogpoint, INT)
    tgeompoint(th3index), tgeogpoint(th3index)
  Ever/always predicates (12 overloads)
    everEq / everNe / alwaysEq / alwaysNe across H3INDEX×TH3INDEX,
    TH3INDEX×H3INDEX, TH3INDEX×TH3INDEX
  Temporal eq/ne (6 overloads)
    tEq / tNe across the same 3 shapes
  H3 cell properties
    th3indexGetResolution / th3indexGetBaseCellNumber /
    th3indexIsValidCell / th3indexIsResClassIII /
    th3indexIsPentagon
  Hierarchy
    th3indexCellToParent[+Next] / th3indexCellToCenterChild[+Next]
    th3indexCellToChildPos / th3indexChildPosToCell
  Geometry / boundary
    th3indexCellToBoundary
  Directed edges
    th3indexAreNeighborCells / th3indexCellsToDirectedEdge
    th3indexIsValidDirectedEdge /
    th3indexGetDirectedEdge[Origin|Destination]
    th3indexDirectedEdgeToBoundary
  Vertices
    th3indexCellToVertex / th3indexVertexToLatlng /
    th3indexIsValidVertex
  Grid traversal
    th3indexGridDistance / th3indexCellToLocalIj /
    th3indexLocalIjToCell
  Cell area / edge length / great-circle distance
    th3indexCellArea / th3indexEdgeLength /
    tgeogpointGreatCircleDistance

Template-macro `TH3_UNARY_TEMP`, `TH3_TEMP_INT32_TEMP`,
`TH3_TEMP_TEMP_TEMP`, `TH3_TEMP_TEXT_TEMP`, `TH3_EA_H3_T`,
`TH3_EA_T_H3`, `TH3_EA_T_T`, `TH3_T_H3_T_TEMP`, `TH3_T_T_H3_TEMP`,
`TH3_T_T_T_TEMP` shrink the boilerplate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant