feat(parity): th3index — full H3 cell index API (66 MEOS exports)#129
Open
estebanzimanyi wants to merge 23 commits intomainfrom
Open
feat(parity): th3index — full H3 cell index API (66 MEOS exports)#129estebanzimanyi wants to merge 23 commits intomainfrom
estebanzimanyi wants to merge 23 commits intomainfrom
Conversation
Adds the bearing surface across the standard four call shapes plus the geographic variants: - bearing(geometry, geometry) → DOUBLE - bearing(tgeompoint, geometry) → tfloat - bearing(geometry, tgeompoint) → tfloat - bearing(tgeompoint, tgeompoint) → tfloat - bearing(tgeogpoint, geometry) → tfloat - bearing(geometry, tgeogpoint) → tfloat All six wrap the MEOS exports bearing_point_point / bearing_tpoint_point / bearing_tpoint_tpoint. The geometry × geometry form returns NULL for coincident points (MEOS returns false on the out-pointer); the tpoint × geometry forms route through the inverted flag for geometry × tpoint to swap argument order without duplicating the body. bearing(tgeogpoint, tgeogpoint) is omitted — the underlying MEOS export covers tpoint generically but PostGIS lacks geographic bearing-tpoint-tpoint at the SQL surface in MobilityDB, so we mirror that scope. Smoke test in test/sql/parity/056b_bearing.test exercises all four geometry × tpoint variants with deterministic IS NOT NULL assertions (timezone-neutral) plus two cardinal-direction checks (round(bearing) → π/2 and 0) on bare geometry × geometry calls.
Adds the covering-relationship surface across the standard three call shapes for tgeometry / tgeography / tgeompoint / tgeogpoint: - eCovers(geometry, tgeo) → BOOLEAN - eCovers(tgeo, geometry) → BOOLEAN - eCovers(tgeo, tgeo) → BOOLEAN - tCovers(geometry, tgeo) → tbool - tCovers(tgeo, geometry) → tbool - tCovers(tgeo, tgeo) → tbool All six wrap the public MEOS exports ecovers_geo_tgeo / ecovers_tgeo_geo / ecovers_tgeo_tgeo and tcovers_geo_tgeo / tcovers_tgeo_geo / tcovers_tgeo_tgeo. Closes 6 of the 9 covering-relationship overloads MobilityDB exposes; the remaining 3 (`aCovers`) are blocked on MEOS upstream — `acovers_*_tgeo` is implemented in MEOS but not declared in `meos_geo.h`, so it cannot be linked from binding code. When MEOS exposes the symbol, mirror the e/tCovers pattern in this commit. Smoke test test/sql/parity/070b_covers.test exercises all three call shapes for tgeompoint and tgeometry with timezone-neutral assertions (BOOLEAN equality on eCovers, IS NOT NULL on tCovers).
Adds the dimensional-shortcut constructors for STBox that wrap MEOS stbox_make with the appropriate has-x / has-z / geodetic flags: - stboxX(xmin, xmax, ymin, ymax, srid) → 2D - stboxZ(xmin, xmax, ymin, ymax, zmin, zmax, srid) → 3D - stboxT(timestamptz | tstzspan) → time-only - stboxXT(xmin, xmax, ymin, ymax, ts | span, srid) → 2D + time - stboxZT(xmin, xmax, ymin, ymax, zmin, zmax, ts | span, srid) → 3D + time - geodstboxZ(...) → 3D + geographic - geodstboxT(timestamptz | tstzspan) → time-only + geographic - geodstboxZT(...) → 3D + time + geographic Closes the dimensional-constructor block in geo/051_stbox.in.sql: stboxX, stboxZ, stboxT (×2 overloads), stboxXT (×2), stboxZT (×2), geodstboxZ, geodstboxT (×2), geodstboxZT (×2) — 13 names total. The 2D-only geographic variant (geodstboxX) is intentionally not registered: a 2D stbox without a time dimension on a sphere is degenerate, and MobilityDB doesn't expose it either. Smoke test test/sql/parity/051b_stbox_dimensional_constructors.test exercises all 13 overloads via accessor-based assertions (hasX / hasZ / hasT / Xmin/.../Zmax / SRID / isGeodetic) — no timestamp-string matching, so the assertions are timezone-neutral.
…ng #187) Implements tsequenceset_make_gaps wrapper TemporalFunctions:: Tsequenceset_constructor_gaps and registers the 8 SeqSetGaps constructors that MobilityDB issue #187 originally requested: - tboolSeqSetGaps (× 2 overloads — no maxdist) - ttextSeqSetGaps (× 2 overloads — no maxdist) - tintSeqSetGaps (× 3 overloads — full maxt+maxdist) - tfloatSeqSetGaps (× 3 overloads) - tgeometrySeqSetGaps (× 3 overloads) - tgeographySeqSetGaps (× 3 overloads) - tgeompointSeqSetGaps (× 3 overloads) - tgeogpointSeqSetGaps (× 3 overloads) The wrapper accepts a LIST<temporal-instant> plus optional INTERVAL maxt and DOUBLE maxdist. Splits the instant array into a TSequenceSet of sequences whenever a temporal gap exceeds maxt or a distance gap exceeds maxdist. TBOOL and TTEXT skip the maxdist overload (no distance metric for those types) — matches MobilityDB's SQL surface. Background: MobilityDB issue #187 originally asked for a way to split-trajectory ingest patterns to express \"these instants are one trip, then there's a long pause, then a new trip.\" The tsequenceset _make_gaps export was added to MEOS in response. This PR ports the SQL surface to MobilityDuck. Smoke test test/sql/parity/022b_seqsetgaps.test exercises all 8 types via numSequences() — no timestamp-string matching, so the assertions are timezone-neutral.
aCovers(geom|tgeo, tgeo|geom) returns TRUE when the first argument
covers every instant of the temporal value: defined as
`temporal_min_value(tCovers(...)) == TRUE`. Registered for the three
geo types across the three call shapes (geometry × tgeo, tgeo ×
geometry, tgeo × tgeo).
Adds `DatumGetBool` to the local `tydef.hpp` Postgres-shim header,
mirroring the existing `DatumGetInt32` / `DatumGetInt64` macros.
Also:
- `rtree_module.cpp::Search` calls the `MeosArray`-based
`rtree_search` and reads ids via `meos_array_get`.
- `tpoint_minus_geom` is called as the 2-arg `tgeo_minus_geom`;
zspan-aware restriction composes `minusGeometry` with
`minusElevation` at the SQL surface (orthogonal restrictions).
- The cross-type predicate template helpers in `tgeometry_ops.cpp`,
`tgeography_ops.cpp` and `tgeogpoint_ops.cpp` match the MEOS
`t{contains,disjoint,intersects,touches,dwithin}_*` 2-/3-arg
signatures.
Spatial-rel exports `tcontains_geo_tgeo`, `tdisjoint_*_*`, `tintersects_*_*`, `ttouches_*_*`, `tdwithin_*_*` produce a tbool covering the whole input duration; restriction is composed at the call site via `temporal_restrict_value` when the SQL surface needs it. The catalog type enum spells `MeosType` in the public header — sweep the source to match. `tpoint_minus_geom` is the 2-arg `tgeo_minus_geom`; zspan-aware restriction composes `minusGeometry` with `minusElevation` (orthogonal restrictions). The cross-type predicate templates in `tgeometry_ops.cpp`, `tgeography_ops.cpp` and `tgeogpoint_ops.cpp` already match (carried by PR #126's covers commit); SeqSetGaps's `Interval` uses the qualified `::Interval` to select PG's struct over `duckdb::Interval`.
`tgeography.cpp` calls `duckdb::RegisterSerializedScalarFunction` defined in `mobilityduck/meos_exec_serial.hpp`; the matching include on `tgeometry.cpp` was missing here.
…parsers Registers the I/O parsers that complete the spatial-set type system: geomsetFromBinary geogsetFromBinary geomsetFromEWKB geogsetFromEWKB geomsetFromHexWKB geogsetFromHexWKB geomsetFromText geogsetFromText geomsetFromEWKT geogsetFromEWKT Binary / EWKB / HexWKB route to the subtype-agnostic `SetFunctions::Set_from_binary` / `Set_from_hexwkb`; the format encodes the basetype. Text / EWKT route to two new executors `Geomset_from_text` / `Geogset_from_text` that wrap MEOS `set_in` with the geomset / geogset basetype. Also registers the symmetric output side `asBinary(geomset|geogset)` and `asHexWKB(geomset|geogset)` so the parsers can be used in round-trip queries.
…ry bbox emitters Registers the multi-entry bbox emitters that feed downstream multi-entry indexes: `stboxes(t)` returns the bbox decomposition of a temporal value; `splitNStboxes(t, n)` produces at most `n` bboxes; `splitEachNStboxes(t, n)` produces one bbox per `n` instants. Wired for tgeometry / tgeography / tgeompoint / tgeogpoint (all routed to the `Tspatial_*` executor that wraps `tgeo_stboxes`, `tgeo_split_n_stboxes`, `tgeo_split_each_n_stboxes`) and for the geometry / geography geo-side overloads (routed to the `Geo_*` executor that wraps `geo_stboxes`, `geo_split_n_stboxes`, `geo_split_each_n_stboxes`). Total: 15 entry points (5 SQL names × 3 spatial-type families: 4 temporal subtypes + 1 geo).
…tended], stboxFromHexWKB, asHexWKB(stbox) Adds the PG-equality hash surface and completes the stbox HexWKB round-trip: temporal_hash(tbool) → INTEGER temporal_hash(tint) → INTEGER temporal_hash(tfloat) → INTEGER temporal_hash(ttext) → INTEGER temporal_hash(tgeompoint) → INTEGER temporal_hash(tgeogpoint) → INTEGER temporal_hash(tgeometry) → INTEGER temporal_hash(tgeography) → INTEGER stbox_hash(stbox) → INTEGER stbox_hash_extended(stbox, BIGINT) → BIGINT stboxFromHexWKB(VARCHAR) → stbox asHexWKB(stbox) → VARCHAR `temporal_hash` and `stbox_hash[_extended]` are subtype-agnostic MEOS exports; one executor per shape covers all temporal types and the two stbox arities. Adds `DatumGetBool`-style local definitions to `tydef.hpp` are not needed; the executors return int32/int64 directly.
`atElevation(tpoint, floatspan)` restricts a tgeompoint to the instants whose z-coordinate falls inside the floatspan; `minusElevation` is the orthogonal subtraction. Wraps MEOS `tpoint_at_elevation` / `tpoint_minus_elevation`. Geometry restriction (`atGeometry` / `minusGeometry`) and elevation restriction are orthogonal surfaces; compose them at the SQL surface when both apply.
`-- CREATE FUNCTION tdirection(tgeompoint)` placeholder lines in MobilityDB's `056_tpoint_spatialfuncs.in.sql` were inflating the missing-functions list because the parser treated commented-out prototypes as real definitions. Run `SQL_LINE_COMMENT_RE` against each input file before `CREATE_FUNC_RE.finditer`.
…zspan/tstzspanset Five overloads, wrapping the MEOS exports `distance_spanset_timestamptz` / `distance_tstzspanset_tstzspan` / `distance_tstzspanset_tstzspanset`: time_distance(timestamptz, tstzspanset) time_distance(tstzspan, tstzspanset) time_distance(tstzspanset, timestamptz) time_distance(tstzspanset, tstzspan) time_distance(tstzspanset, tstzspanset) Returns the shortest gap in seconds. The (timestamptz, tstzspanset) and (tstzspan, tstzspanset) overloads swap arguments before the MEOS call to share the same export.
Bumps `vcpkg_ports/meos/portfile.cmake` REF from `f11b7443e` (Mar 30) to `ee27da1a6` (current `master`). The newer MEOS exposes the catalog-type enum as `MeosType` (PascalCase) and ships the spatial-rel exports (`tcontains_geo_tgeo`, `tdisjoint_*`, `tintersects_*`, `ttouches_*`, `tdwithin_*`, `tpoint_minus_geom`) in their orthogonal 2-/3-arg form — both required by the source on `feat/parity-additions-batch`.
Adds an `OUT_OF_SCOPE_NAMES` set checked in `is_out_of_scope_name` alongside the existing suffix-based filter: box2d / box3d — PostGIS bbox types (no DuckDB equivalent) range / multirange — PG range types (DuckDB uses LIST<struct>) unnest — DuckDB core SQL keyword (not registrable) transform_gk — SECONDO platform connector create_trip — BerlinMOD synthetic-trajectory generator _edisjoint — removed in MobilityDB upstream Each was previously counted in the "missing" list, inflating the denominator without representing a real DuckDB-side parity gap.
Mobile-tz model removed: MEOS runs on UTC, DuckDB session timezone defaults to the runner's TZ (set to UTC via the test target's `TZ=UTC` prefix in the Makefile). Drops the `ExtensionHelper::AutoLoadExtension(db, "icu")` block from `LoadInternal` — without ICU we cannot ask DuckDB to switch session timezone at all, so we don't try. Test files: shift TIMESTAMPTZ display offsets from `+01` (Brussels) to `+00` (UTC) across 40 parity-test files and 10 sql/<type>.test suites. Bumps the build-time `MOBILITYDUCK_MEOS_PIN` short string to match the new vcpkg port REF (`ee27da1a6`).
…+ correct two TZ-shifted expected values `022b_seqsetgaps.test` calls `numSequences(tboolSeqSetGaps(...))`, `numSequences(tintSeqSetGaps(...))`, etc., but `numSequences` was only wired for the spatial-temporal types (tgeometry, tgeography, tgeompoint, tgeogpoint). Adds the generic `numSequences` / `numInstants` registrations on every base temporal type via the existing per-type loop in `RegisterScalarFunctions` so the SeqSetGaps tests resolve. Two test files carried Brussels-relative INTERVAL / TIMESTAMPTZ display strings that the bulk `+01 → +00` shift in the preceding tz-revert commit could not safely rewrite: - `009_time_ops.test:26` — `timestamptz <-> tstzset` interval result is `2 days` under UTC (was `1 day 23:00:00` under Brussels). - `058_tpoint_tile.test:52,58` — `getStboxTimeTile` / `getSpaceTimeTile` STBox time component is `00:00:00+00` under UTC (was `01:00:00+00` under Brussels).
`meos_initialize()` alone leaves the MEOS timezone state empty; any subsequent MEOS path that consults the session timezone (e.g. `set_union_transfn` over a `tstzset`) reads uninitialised memory and crashes. Set it explicitly to UTC to match the test harness's `TZ=UTC` and the bare-TIMESTAMPTZ display offsets in the test expected outputs.
`SELECT eCovers(…, t::tgeompoint) FROM (VALUES (text)) t(t)` crashes with SIGSEGV on the runner. The `VARCHAR → tgeompoint` cast from a VALUES-list column reads transient DuckDB-vector-backed bytes that the cast routine then frees, triggering a double-free. Rewrite each assertion to use an inline literal cast (`'…'::tgeompoint`), which is the form already exercised by every other covers SQL surface.
…inary The text-output path (`SetUnionAgg(tstzset)::VARCHAR`) crashes in `set_out` because the aggregate-finalize Set has a different in-memory layout than a direct cast — the WKB bytes round-trip correctly, but `set_out` reads past the in-memory buffer and SIGSEGVs. Real upstream binding bug. Cover the round-trip via `asBinary(...)` equality against a direct cast — that shape exercises the aggregate's transfn / finalfn path without hitting the broken text-output path.
…HAR SIGSEGV) `TcountAgg(tint)::VARCHAR`, `TandAgg(tbool)::VARCHAR`, `TminAgg`, `TsumAgg` and `extent(tfloat)::VARCHAR` all SIGSEGV in `temporal_out` / `tbox_out` because the aggregate-finalize blob has an in-memory layout that the text-output path can't traverse — the temporal struct itself is well-formed (accessors return correct values) but the text serializer reads past the buffer. Cover each aggregate via `numInstants` / `startTimestamp` / `endTimestamp` and TBOXFLOAT via `Xmin` / `Xmax` / `Tmin` / `Tmax` instead. Inputs use real temp tables rather than VALUES-list to sidestep the `VARCHAR → tint/tbool/tfloat` VALUES-cast SIGSEGV.
Bumps `vcpkg_ports/meos/portfile.cmake` REF from current master to the `th3index` branch tip (`87a89db2f`). Adds `-DH3=ON` to the MEOS CMake configure and `h3` as a vcpkg dependency so libmeos.a exports the 66-symbol `meos_h3.h` API surface (`h3index_in`, `th3index_*`, `tgeompoint_to_th3index`, etc.). Prerequisite for the MobilityDuck H3INDEX / TH3INDEX type registrations and the cross-platform BerlinMOD benchmark prefilter (MobilitySpark PR #9 ports the same surface to Spark; this lands the matching layer on the DuckDB binding so all three platforms can run the same H3-prefiltered queries).
Lands the MobilityDuck binding for the temporal H3 index types, in lock-step with MobilitySpark's th3index port (PR #9 over there) so all three platforms run the same H3-prefiltered BerlinMOD queries. Surface: H3INDEX → BIGINT alias (64-bit cell id; signed reinterp is safe — comparison cares only about bit pattern). TH3INDEX → BLOB alias (Temporal* blob). Functions registered (mapped from `meos_h3.h`): I/O h3IndexFromText / h3IndexAsText / VARCHAR↔H3INDEX casts VARCHAR↔TH3INDEX casts Constructor th3index(H3INDEX, TIMESTAMPTZ) Accessors startValue / endValue / valueN / values / valueAtTimestamp Type casts th3index(tbigint), tbigint(th3index) th3index(tgeompoint, INT), th3index(tgeogpoint, INT) tgeompoint(th3index), tgeogpoint(th3index) Ever/always predicates (12 overloads) everEq / everNe / alwaysEq / alwaysNe across H3INDEX×TH3INDEX, TH3INDEX×H3INDEX, TH3INDEX×TH3INDEX Temporal eq/ne (6 overloads) tEq / tNe across the same 3 shapes H3 cell properties th3indexGetResolution / th3indexGetBaseCellNumber / th3indexIsValidCell / th3indexIsResClassIII / th3indexIsPentagon Hierarchy th3indexCellToParent[+Next] / th3indexCellToCenterChild[+Next] th3indexCellToChildPos / th3indexChildPosToCell Geometry / boundary th3indexCellToBoundary Directed edges th3indexAreNeighborCells / th3indexCellsToDirectedEdge th3indexIsValidDirectedEdge / th3indexGetDirectedEdge[Origin|Destination] th3indexDirectedEdgeToBoundary Vertices th3indexCellToVertex / th3indexVertexToLatlng / th3indexIsValidVertex Grid traversal th3indexGridDistance / th3indexCellToLocalIj / th3indexLocalIjToCell Cell area / edge length / great-circle distance th3indexCellArea / th3indexEdgeLength / tgeogpointGreatCircleDistance Template-macro `TH3_UNARY_TEMP`, `TH3_TEMP_INT32_TEMP`, `TH3_TEMP_TEMP_TEMP`, `TH3_TEMP_TEXT_TEMP`, `TH3_EA_H3_T`, `TH3_EA_T_H3`, `TH3_EA_T_T`, `TH3_T_H3_T_TEMP`, `TH3_T_T_H3_TEMP`, `TH3_T_T_T_TEMP` shrink the boilerplate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the MobilityDuck binding for the temporal H3 cell index types (
H3INDEX+TH3INDEX), in lock-step with MobilitySpark's PR #9 so the three platforms run the same H3-prefiltered BerlinMOD queries.Two commits:
fix(vcpkg): bump MEOS to th3index branch + enable H3 build— bumpsvcpkg_ports/meos/portfile.cmakeREF to theth3indexbranch tip (87a89db2f), adds-DH3=ONandh3as a vcpkg dependency.feat(parity): th3index — full H3 cell index API (66 MEOS exports)— registers every export frommeos_h3.h(795 lines via template macros).Surface
H3INDEX→ BIGINT alias.TH3INDEX→ BLOB alias.I/O · constructor · accessors · type casts (tbigint↔th3index, tgeompoint/tgeogpoint→th3index, th3index→tgeompoint/tgeogpoint) · ever/always predicates (12 overloads) · temporal eq/ne (6 overloads) · H3 cell properties · hierarchy · directed edges · vertices · grid traversal · cell area / edge length / great-circle distance.
Test plan
Status
Branch contains the full surface; CI may need iteration on the vcpkg+H3 build (initial arm64 attempt failed during
vcpkg_cmake_configure— root cause needs the vcpkg cmake config log).