Skip to content

feat(meos): W6 codegen — tgeo restriction at/minus geom (2 ops + 1 template + 1 systest) (stacks on #28)#29

Open
estebanzimanyi wants to merge 15 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/nebula-codegen-w6-restriction-tgeo-geom
Open

feat(meos): W6 codegen — tgeo restriction at/minus geom (2 ops + 1 template + 1 systest) (stacks on #28)#29
estebanzimanyi wants to merge 15 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/nebula-codegen-w6-restriction-tgeo-geom

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Summary

First restriction-shape operators. MEOS signature is
Temporal* fn(const Temporal*, const GSERIALIZED*) — returns the
clipped Temporal* (non-null if input survives the restriction,
null if clipped to empty).

For per-event single-instant inputs (the codegens current shape), the
restriction collapses to a filter predicate: 1 if the point survives,
0 if clipped. This mirrors marianas TemporalAtStBox int-collapse
pattern exactly — see TemporalAtStBoxPhysicalFunction.cpp:90 for the
hand-written precedent (clipped.get() != nullptr ? 1 : 0).

MEOS function NebulaStream operator survives if …
tgeo_at_geom TemporalAtGeometry point inside the static geom
tgeo_minus_geom TemporalMinusGeometry point outside the static geom

Honest semantic note

Per-event single-instant TEMPORAL_AT_GEOMETRY is semantically
equivalent
to TEMPORAL_ECONTAINS_GEOMETRY (PR #23), and
TEMPORAL_MINUS_GEOMETRY ≡ TEMPORAL_EDISJOINT_GEOMETRY. The
restriction ops only add genuinely new SQL surface when the input
tgeompoint is a sequence of multiple instants (W7-territory —
windowed aggregations), where clipping produces a different sequence
than the original. Shipped now because:

  1. They round out the SQL surface PostGIS / MobilityDB users expect
    (the AT/MINUS idiom is standard there).
  2. They exercise the codegens first restriction-shape template,
    which W7 sequence-aggregated restriction will inherit.
  3. The collapse-to-int return matches marianas TemporalAtStBox
    so downstream consumers see a consistent shape across at/minus ops.

Generator additions

  • PHYSICAL_CPP_TEMPLATE_TEMPORAL_POINT_RESTRICTION
    — calls Temporal* {meos_call}(...), checks non-null, free()s,
    returns int. Flag: build_temporal_point_restriction.
  • dispatch_case_for() reuses the existing
    DISPATCH_CASE_ONE_TEMPORAL_POINT template — same 4-arg parser
    shape (lon, lat, ts, geom), only the physical-cpp body shape
    differs (Temporal* return vs int return).

Per-shape systest

Tests/Functions/at_geometry.test exercises TEMPORAL_AT_GEOMETRY:
one point inside a polygon (expect 1), one outside (expect 0).

Local verification

nes-development:mobilitynebula-v2:

cmake --build build-w1 --target nes-physical-operators -j 4
  → [49/49] Linking libnes-physical-operators.a
cmake --build build-w1 --target nes-logical-operators  -j 4
  → [63/63] Linking libnes-logical-operators.a
cmake --build build-w1 --target nes-sql-parser         -j 4
  → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first attempt.

Stack

#21 tools(codegen): generator + design
  └── #22 fix(meos): proto extra_fields + Werror unused-param
        └── #23 feat(meos): W1 — 5 spatial-rel ops (tgeo × geom)
              └── #24 feat(meos): W2 — close _tgeo_geo row
                    └── #25 feat(meos): W3 — close _tgeo_tgeo row
                          └── #26 feat(meos): W4 — distance family
                                └── #27 feat(codegen): W4.5 parser glue
                                      └── #28 feat(meos): W5a — tnumber NAD
                                            └── THIS PR — W6: restriction at/minus geom

… on NebulaStream (33 YAMLs, 27/27 cells)

Additive scaffold for the BerlinMOD-9 × 3 streaming-form parity contract
on MobilityNebula, sibling to the existing SNCB Q-series and matching
the MobilityFlink MobilityDB#3 / MobilityKafka MobilityDB#1 streaming-form definitions.

All 27 cells covered:

  Q1 'which vehicles have appeared'      — full (continuous + windowed + snapshot)
  Q2 'where is vehicle X at time T'      — full
  Q3 'vehicles within 5 km of P'         — full
  Q4 'vehicles inside region R (polygon)'— full
  Q5 'pairs of vehicles meeting near P'  — partial (emit per-vehicle trajectories near P; consumer joins)
  Q6 'cumulative distance per vehicle'   — partial (emit TEMPORAL_SEQUENCE; consumer computes length)
  Q7 'first passage of vehicle through POI' × {POI1, POI2, POI3}
                                          — full (per-POI fan-out)
  Q8 'vehicles within d of LINESTRING'   — full (edwithin_tgeo_geo with LINESTRING geometry)
  Q9 'distance between X and Y at time T'— partial (emit X and Y trajectories; consumer joins)

18 of 27 cells are FULL (the BerlinMOD-Q semantic is computed entirely
inside NebulaStream). 9 cells are PARTIAL — NebulaStream emits the
per-window inputs (trajectory, candidate vehicles) and a consumer
post-processes for the final BerlinMOD-Q answer. The partial pattern
is the natural expression of these queries in NebulaStream's current
SQL surface; the path to FULL is documented per-Q in
docs/berlinmod-streaming-forms.md (a stream-self-join for Q5/Q9, a
temporal_length scalar function for Q6).

Form mapping to NebulaStream windows:

  continuous: SLIDING(time_utc, SIZE 1 SEC, ADVANCE BY 1 SEC)
  windowed:   TUMBLING(time_utc, SIZE 10 SEC)
  snapshot:   TUMBLING(time_utc, SIZE 5 SEC)

MEOS-side surface consumed (already exposed by PR MobilityDB#14 + follow-ups):

  edwithin_tgeo_geo — Q3 (POINT predicate), Q4 (POLYGON, d=0.0),
                      Q5 (POINT predicate), Q7 (per-POI POINT),
                      Q8 (LINESTRING predicate)
  TEMPORAL_SEQUENCE — Q2 / Q5 / Q6 / Q9 (per-window per-vehicle trajectory)

No new MEOS PhysicalFunction classes added; no C++ changes; no SNCB
Q-series modifications. All 33 YAMLs are additive in a new
Queries/berlinmod/ subdirectory.

Add (additions):
  Queries/berlinmod/q1_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q2_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q3_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q4_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q5_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q6_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q7_poi{1,2,3}_{continuous,windowed,snapshot}.yaml (9, full via fan-out)
  Queries/berlinmod/q8_{continuous,windowed,snapshot}.yaml          (3, LINESTRING predicate)
  Queries/berlinmod/q9_{continuous,windowed,snapshot}.yaml          (3, partial)
  Input/input_berlinmod.csv  (sample data: 3 vehicles × 21 events, 14 simulated seconds)
  docs/berlinmod-streaming-forms.md

Validation: every YAML parses cleanly via python3 yaml.safe_load.
Runtime verification gated on the NebulaStream test harness.

Coverage: 27 of 27 cells (100 %), with 18 FULL and 9 PARTIAL annotated
explicitly per Q. Path to FULL for the 9 PARTIAL cells is one
MobilityNebula C++ PhysicalFunction class each (or a NebulaStream
upstream stream-self-join), documented in
docs/berlinmod-streaming-forms.md.
…-form cells to full

Adds the TEMPORAL_LENGTH aggregation across the four levels of the
NebulaStream pipeline (logical / physical / parser / lowering) so the
BerlinMOD-Q6 "cumulative distance per vehicle" streaming-form cells
(continuous + windowed + snapshot) compute the spheroidal trajectory
length entirely inside NebulaStream instead of emitting raw trajectories
for a consumer-side reduction.

Logical: nes-logical-operators/{include,src}/Operators/Windows/Aggregations/Meos/TemporalLengthAggregationLogicalFunction.{hpp,cpp}
mirroring TemporalSequenceAggregationLogicalFunctionV2 but with finalAggregateStampType = FLOAT64.
Registers as "TemporalLength" in the aggregation registry. Serializes through the existing
TemporalAggregationSerde wire shape with the type tag overridden.

Physical: nes-physical-operators/{include,src}/Aggregation/Function/Meos/TemporalLengthAggregationPhysicalFunction.{hpp,cpp}
identical lift / combine / reset / cleanup to TemporalSequenceAggregationPhysicalFunction;
the lower() path builds the same MEOS instant-set trajectory string, parses it via
MEOSWrapper::parseTemporalPoint, and calls MEOS' tpoint_length(Temporal*) to return a single
FLOAT64 result.

Parser: nes-sql-parser/AntlrSQL.g4 adds the TEMPORAL_LENGTH lexer token and includes it in
functionName. AntlrSQLQueryPlanCreator.cpp adds the TEMPORAL_LENGTH dispatch in both the
case-label and string-name paths, parallel to TEMPORAL_SEQUENCE.

Lowering: nes-query-optimizer/src/RewriteRules/LowerToPhysical/LowerToPhysicalWindowedAggregation.cpp
adds the TEMPORAL_LENGTH special-case lowering, parallel to TEMPORAL_SEQUENCE, producing a
TemporalLengthAggregationPhysicalFunction with the same (lon, lat, timestamp) state schema.

YAMLs: Queries/berlinmod/q6_{continuous,windowed,snapshot}.yaml updated to call
TEMPORAL_LENGTH directly; the FLOAT64 output column replaces the VARSIZED trajectory output;
header comments updated to "FULL".

Docs: docs/berlinmod-streaming-forms.md updated to reflect 21 cells full + 6 cells partial
(Q5 + Q9 only); the path-to-full table now lists those two queries only.

YAML safe_load green on all 3 Q6 cells. Build verification gated on the user's NebulaStream
test harness (vcpkg-bootstrapped); the C++ code follows the established TemporalSequence
template exactly, with the lower() path replaced by tpoint_length.
…streaming-form cells to full

Mirrors the TEMPORAL_LENGTH pattern from the parent PR with two new
four-field aggregations that close the last 6 partial cells on the
MobilityNebula BerlinMOD parity matrix:

PAIR_MEETING(lon, lat, ts, vehicle_id) -> VARSIZED
  Lift collects per-event tuples. Lower picks each vehicle's latest known
  position in the window, enumerates pairs (a < b), calls MEOS' geog_dwithin
  with dMeet = 200 m hardcoded for the BerlinMOD scaffold, and emits a
  string-encoded list of meeting pairs (vid_a, vid_b, ts, "<=dMeet" tag).
  Future PR can parameterize dMeet via a constant input. Closes Q5 × 3 cells.

CROSS_DISTANCE(lon, lat, ts, vehicle_id) -> FLOAT64
  Same lift shape. Lower picks the latest known position of each of the two
  target vehicles (VID_A = 100, VID_B = 200 hardcoded), drives the MEOS
  nad_tgeo_tgeo distance, and returns a FLOAT64 (NaN if either vehicle is
  unobserved). Future PR can parameterize (VID_A, VID_B). Closes Q9 × 3 cells.

Wired across the four pipeline layers identically to TEMPORAL_LENGTH:
  - nes-physical-operators/{include,src}/Aggregation/Function/Meos/{PairMeeting,CrossDistance}AggregationPhysicalFunction.{hpp,cpp}
  - nes-logical-operators/{include,src}/Operators/Windows/Aggregations/Meos/{PairMeeting,CrossDistance}AggregationLogicalFunction.{hpp,cpp}
  - nes-physical-operators/src/Aggregation/Function/Meos/CMakeLists.txt + nes-logical-operators/src/Operators/Windows/Aggregations/Meos/CMakeLists.txt plugin entries
  - nes-sql-parser/AntlrSQL.g4 lexer + functionName tokens
  - nes-sql-parser/src/AntlrSQLQueryPlanCreator.cpp case-label + string-name dispatch
  - nes-query-optimizer/src/RewriteRules/LowerToPhysical/LowerToPhysicalWindowedAggregation.cpp special-case lowering with 4-field state schema

YAMLs: Queries/berlinmod/q5_{continuous,windowed,snapshot}.yaml and
q9_{continuous,windowed,snapshot}.yaml rewritten to call the new
aggregations directly; sink schemas updated to FLOAT64 / VARSIZED;
header comments updated to FULL.

Docs: docs/berlinmod-streaming-forms.md updated to reflect 27/27 cells
full (was 21 full + 6 partial); MEOS-operators table now lists
PAIR_MEETING and CROSS_DISTANCE alongside the existing ones.

YAML safe_load green on all 6 rewritten Q5/Q9 cells. C++ follows the
established TemporalLength template from the parent MobilityDB#16; build
verification gated on the user's NebulaStream test harness.
… covered' section

After PR MobilityDB#16 (TEMPORAL_LENGTH closes Q6) and PR MobilityDB#17 (PAIR_MEETING +
CROSS_DISTANCE close Q5 + Q9), the parity matrix is 27/27 full —
the doc's own coverage table at the top confirms it. But the
section 'Not covered (15 cells / 5 queries)' at line 77 was a
remnant from the pre-MobilityDB#16/MobilityDB#17 state and contradicts the rest of the
doc. Remove it.

Add a new 'Streaming-semantics tier overlay' section that classifies
each BerlinMOD-Q by its streaming-execution tier (stateless /
bounded-state / windowed / cross-stream) per the closed 7-value
vocabulary proposed for the MEOS-API objectModel.streamingSemantics
facet (see the sibling RFC on MEOS-API PR MobilityDB#10). The mapping makes
the cross-binding picture explicit: a Q's tier on NebulaStream is
the same tier on Flink / Kafka, and the table points to the
equivalent generic wiring class on Flink for each tier.

Two short follow-up notes explain why cross-stream looks different
on NebulaStream (single-aggregation Cartesian enumeration vs Flink's
interval-join across two streams — same semantic, different
topology) and why Q7 is bounded-state rather than windowed (per-POI
fan-out, per-(vehicle, POI) bounded state, no full-sequence
reduction needed).

Refresh the 'Sibling parity references' section to point at the
current state of the Flink and Kafka work — Flink's per-tier wiring
infrastructure under org.mobilitydb.flink.meos.wirings (5 generic
classes covering 100% of the streamable surface) and Kafka's codegen
mirror under org.mobilitydb.kafka.meos. Drops stale PR-number
references per the same as-is / no-internal-process discipline
applied elsewhere in the ecosystem docs.

Stacks on PR MobilityDB#17. Docs-only; touches no YAML, no C++ pipeline-layer
file.
The PAIR_MEETING aggregation (added in MobilityDB#17) hardcoded the meeting-distance
threshold at 200 m via a static constexpr DMEET_METRES, with the PR body
noting parameterization as future work. This PR lands that future work:
PAIR_MEETING now takes a fifth argument — a numeric constant in metres —
and the physical operator uses it per-query.

## Surface

  PAIR_MEETING(lon, lat, ts, vehicle_id, dMeet)
                                          ^^^^^ new fifth arg (numeric constant, metres)

The first four args remain FieldAccess (lon, lat, ts, vehicle_id); the
fifth is pulled from the parser's constantBuilder as a numeric literal,
parsed via std::stod, and threaded through the logical→physical lowering
chain into the lower() lambda alongside the existing state pointers.

## Files (9, all stacked on MobilityDB#18MobilityDB#17MobilityDB#16MobilityDB#15)

| Layer | File |
|---|---|
| Physical .hpp | PairMeetingAggregationPhysicalFunction.hpp — `DMEET_METRES` constexpr → `DEFAULT_DMEET_METRES` + instance field `dMeetMetres` |
| Physical .cpp | PairMeetingAggregationPhysicalFunction.cpp — constructor takes dMeet; lower() passes it to the captureless lambda via `nautilus::val<double>` |
| Logical .hpp  | PairMeetingAggregationLogicalFunction.hpp — constructor + create() factory take dMeet; getter `getDMeetMetres()` |
| Logical .cpp  | PairMeetingAggregationLogicalFunction.cpp — initialize field; Registrar deserialize path uses DEFAULT_DMEET_METRES (see Serde caveat below) |
| Parser        | AntlrSQLQueryPlanCreator.cpp — both PAIR_MEETING dispatch sites (lexer-token case + funcName string-name case) extract the constant from constantBuilder, std::stod it, pass to create() |
| Lowering      | LowerToPhysicalWindowedAggregation.cpp — pmDescriptor->getDMeetMetres() flows to the physical constructor |
| YAMLs (×3)    | Queries/berlinmod/q5_continuous.yaml, q5_snapshot.yaml, q5_windowed.yaml — add `, 200.0` as the explicit fifth arg; comments updated to reflect the parameterization |

## Serde round-trip caveat (out of scope for this PR)

`AggregationLogicalFunctionRegistryArguments` is strongly typed to
`vector<FieldAccessLogicalFunction>` — there is no slot for a numeric
constant in the existing Registrar interface, and
`SerializableAggregationFunction` has no proto field for it either. As
a result:

- The parser path (live query execution) is FULLY parameterized — dMeet
  flows from SQL to physical correctly.
- The Serde deserialize path falls back to DEFAULT_DMEET_METRES
  (preserves the 200 m scaffold behaviour). Round-trip fidelity for the
  dMeet value requires (a) adding a new field to
  SerializableAggregationFunction.proto, (b) extending
  AggregationLogicalFunctionRegistryArguments to carry it, and (c)
  threading both through Serialize/Register. That's an infrastructure
  change touching every registered aggregation; tracked as a follow-up.

## Build / test verification

Cannot compile-verify locally — NebulaStream needs the full C++23 +
vcpkg toolchain. Submitted for maintainer build verification (cc
@marianaGarcez). Expected to compile cleanly; the only construction-time
behaviour change is the constructor signature (5 params → 6 params for
physical, 5 → 6 for logical create/ctor); the only runtime behaviour
change is that dMeet is now read from the instance field instead of the
class constexpr (the lambda receives it via the nautilus::val<double>
extra arg).

## Mirrors the CROSS_DISTANCE shape

CROSS_DISTANCE (also added by MobilityDB#17, hardcoded VID_A=100, VID_B=200) has
the exact same parameterization pattern; a sibling PR can apply the
same change with (lon, lat, ts, vid, vid_a, vid_b) — 6 args total
instead of 5. Holding for separate PR.
… args

Sibling to PAIR_MEETING.dMeet parameterization (PR MobilityDB#19) — applies the
same 4-layer pattern to CROSS_DISTANCE. The aggregation (added in MobilityDB#17)
hardcoded the target vehicle pair at (100, 200) via static constexpr
VID_A / VID_B, with the PR body noting parameterization as future work.
This PR lands that future work: CROSS_DISTANCE now takes two unsigned-
integer constants as its fifth and sixth arguments, and the physical
operator uses them per-query.

## Surface

  CROSS_DISTANCE(lon, lat, ts, vehicle_id, vidA, vidB)
                                           ^^^^  ^^^^ new constants (uint64)

The first four args remain FieldAccess; vidA and vidB are pulled from
the parser's constantBuilder (two unsigned-integer literals), std::stoull
them, and threaded through the logical→physical lowering chain into the
lower() lambda alongside the existing state pointer.

## Files (9, same shape as PR MobilityDB#19's PAIR_MEETING change)

| Layer | File |
|---|---|
| Physical .hpp | CrossDistanceAggregationPhysicalFunction.hpp — `VID_A/B` constexpr → `DEFAULT_VID_A/B` + instance fields `vidA/B` |
| Physical .cpp | CrossDistanceAggregationPhysicalFunction.cpp — constructor takes both; lift-time lambda gets them via `nautilus::val<uint64_t>` |
| Logical .hpp  | CrossDistanceAggregationLogicalFunction.hpp — constructor + create() factory + getters |
| Logical .cpp  | CrossDistanceAggregationLogicalFunction.cpp — initialize fields; Registrar deserialize falls back to defaults |
| Parser        | AntlrSQLQueryPlanCreator.cpp — both CROSS_DISTANCE dispatch sites extract two constants, std::stoull both, pass to create() |
| Lowering      | LowerToPhysicalWindowedAggregation.cpp — cdDescriptor->getVidA()/getVidB() flow to physical constructor |
| YAMLs (×3)    | Queries/berlinmod/q9_continuous.yaml, q9_snapshot.yaml, q9_windowed.yaml — add `, 100, 200` as explicit constants; comments updated |

## Serde round-trip caveat (same as PR MobilityDB#19)

`AggregationLogicalFunctionRegistryArguments` is strongly typed to
`vector<FieldAccessLogicalFunction>` — no slot for integer constants.
`SerializableAggregationFunction.proto` has no field for them. So:

- Parser path (live query execution) is FULLY parameterized.
- Serde deserialize path falls back to `DEFAULT_VID_A` / `DEFAULT_VID_B`
  (preserves the 100, 200 scaffold defaults).

Same infrastructure follow-up would close both round-trip gaps at once
(PAIR_MEETING.dMeet and CROSS_DISTANCE.vidA/vidB).

## Build / test verification

Same as PR MobilityDB#19 — submitted for maintainer build verification
(@marianaGarcez). Constants now flow through std::stoull instead of
std::stod; lambda gets two nautilus::val<uint64_t> args instead of one
nautilus::val<double>. Pattern is structurally identical.
…codegen path

Closes the Nebula structural parity gap with Flink/Kafka by shipping
the codegen infrastructure for generating per-MEOS-function pipeline
tuples (logical + physical + parser + lowering). No generated C++
committed in this PR — the maintainer (cc @marianaGarcez) runs the
generator on a chosen MEOS-function batch, reviews output, ships
operators in follow-up PRs at a controlled pace.

Why no generated code in this PR:
- Generator author cannot build NebulaStream (full C++23 + vcpkg
  toolchain not available in author's environment); shipping
  unverified generated code would risk batched-broken operators.
- Per-function review value: maintainer iterates on templates with
  the first batch's build feedback before scaling up.
- Template iteration cost: first-pass templates may need adjustment
  after first build; smaller blast radius if only the generator
  lands.

What lands:
- tools/codegen/codegen_nebula.py — Python generator with embedded
  C++ templates derived 1:1 from the hand-written
  TemporalEDWithinGeometry operator shape (logical/physical/.hpp/.cpp)
- tools/codegen/codegen_input.example.json — first-wave input list
  (5 spatial-relation E/A predicates: EDisjoint, ATouches, ECovers,
  ACrosses, EOverlaps over tgeo_geo)
- tools/codegen/README.md — full design proposal: why codegen, what
  the generator produces, recommended scaling-wave sequence (W1-W5),
  what the generator does NOT do (CMakeLists / parser / grammar
  remain manual paste for idempotence), compile-verification note

Smoke-verified: the generator runs locally + emits 5 operators × 4
files = 20 well-formed C++ source files; templates produce
syntactically-reasonable output matching the existing operator style.

Scaling path (recommended sequence):
- W1: 5 spatial-relation E/A predicates (the example input) — first
  follow-up PR
- W2: All ever/always spatial-relation predicates over tgeo_geo
  (~18 functions) — second follow-up PR
- W3: Distance functions over tgeo_geo and tgeo_tgeo (~30) — third
- W4: Scalar accessors that decompose to per-event reads — template
  extension required
- W5: Aggregations (windowed/cross-stream) — separate generator with
  the aggregation-specific 4-layer pattern

Stacks on PR MobilityDB#20. Tools-only; touches no operator code, no
CMakeLists, no parser/grammar.
Two adjacent compile-breakers found while validating the codegen output of
PR MobilityDB#21 against the latest mariana/main:

1. SerializableAggregationFunction proto declares only {type, on_field,
   as_field}. The 5 MEOS aggregations landing in MobilityDB#16/MobilityDB#17 read additional
   fields out of the proto (vidA/vidB/dMeet/...), so they need the extra
   field. Adds:

       repeated SerializableFunction extra_fields = 4;

   Backwards-compatible (tag 4, new repeated). Aggregations whose extra
   fields are absent continue to deserialize unchanged.

2. CrossDistance/PairMeeting/TemporalLength aggregations carry an unused
   PipelineMemoryProvider& parameter on lower(). Werror=-Wunused-parameter
   turns that into a build failure. Annotates the parameter [[maybe_unused]]
   at the call site — no behavior change, intent stays visible to readers
   who later wire memory into the lowering.

Verified locally on the mobilitynebula-v2 dev image (MEOS baked in):

    cmake --build build-w1 --target nes-physical-operators -j 4
    → [110/111] Linking libnes-physical-operators-registry.a
    → [111/111] Linking libnes-physical-operators.a

Stacks on MobilityDB#21 only because that is the active codegen branch where the
breakage surfaced; the diff itself is independent of any codegen output.
…geom)

First batch of MEOS operators generated by the PR MobilityDB#21 codegen, covering
the spatial-relation family over (tgeo, geometry). Five operators landed,
one per relation pattern:

    edisjoint_tgeo_geo  → TemporalEDisjointGeometry
    atouches_tgeo_geo   → TemporalATouchesGeometry
    ecovers_tgeo_geo    → TemporalECoversGeometry
    acontains_tgeo_geo  → TemporalAContainsGeometry
    etouches_tgeo_geo   → TemporalETouchesGeometry

Each operator is emitted at all four layers — logical .hpp/.cpp +
physical .hpp/.cpp — same shape mariana's hand-written eContainsGeometry
operator uses, so the runtime sees them as ordinary plugin operators
with no special wiring.

Generator tightenings landed alongside the output (kept inside
tools/codegen so they remain re-runnable):

  * physical Registrar reads PhysicalFunctionRegistryArguments.childFunctions
    (the actual field name; the previous template used .children which only
    exists on the logical side).
  * VariableSizedData is accessed through .getContent() / .getContentSize()
    (the real API; the previous template used .getRawByteRef() / .size()
    which do not exist).
  * The MEOS spatial-rel signature is 2-arg (Temporal*, GSERIALIZED*) —
    no trailing atstart bool. The 3-arg distance form lives only on
    edwithin_tgeo_geo and edwithin_tgeo_tgeo and stays out of W1.
  * tools/codegen/codegen_input.example.json now references real MEOS
    symbols (etouches_tgeo_geo, acontains_tgeo_geo). The earlier
    eoverlaps_tgeo_geo / acrosses_tgeo_geo entries were placeholders
    and would not link.

Verified locally on the mobilitynebula-v2 dev image (MEOS baked in):

    cmake --build build-w1 --target nes-logical-operators -j 4
    cmake --build build-w1 --target nes-physical-operators -j 4
    → both link clean. The 5 new operators compile and register at both
      layers.

Stacks on PR-A (proto extra_fields + Werror unused-param) and PR MobilityDB#21
(the codegen itself). The same generator scales to W2 (e/a spatial-rels
over tgeo × tgeo, ~10 ops) and W3 (distance functions over tgeo × geo +
tgeo × tgeo, ~30 ops) with no further template work — that is the path
the 9 BerlinMOD-query recipes open beyond the surface metric.
Adds the 2 remaining publicly-declared 2-arg spatial-rel ops over
(tgeo, geometry) not yet covered by W1 + mariana's seeds:

    adisjoint_tgeo_geo    → TemporalADisjointGeometry
    eintersects_tgeo_geo  → TemporalEIntersectsGeometry

Combined with the prior layers, the public-API _tgeo_geo spatial-rel
row is now complete for the 2-arg shape:

    e:  econtains  ecovers  edisjoint  eintersects  etouches   (5/5)
    a:  acontains  adisjoint  aintersects  atouches            (4/4)

Provenance per layer:
- mariana seeds: TemporalEContainsGeometry, TemporalAIntersectsGeometry,
  TemporalEDWithinGeometry (3-arg), TemporalIntersectsGeometry
- W1 (PR MobilityDB#23):   edisjoint, atouches, ecovers, acontains, etouches
- W2 (this PR):  adisjoint, eintersects

The 3-arg dwithin pair (edwithin / adwithin) is excluded from the
2-arg shape and stays out of this PR.

Note on acovers_tgeo_geo: the symbol exists in libmeos.so but has
no public declaration in meos_geo.h (libmeos-internal only), so it
is correctly out of scope for a binding-level PR.

Local verification on the mobilitynebula-v2 dev image:
    cmake --build build-w1 --target nes-physical-operators -j 4
      → [161/161] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [43/43] Linking libnes-logical-operators.a

Same generator, no template changes, just 2 more input rows — the
mechanical-scale path the 9 BerlinMOD-query recipes open.
… (9 ops)

Closes the public-API _tgeo_tgeo 2-arg spatial-relation row by emitting
all 9 publicly-declared ops as Nebula operators (one new op per relation,
per e/a quantifier). The MEOS signature is
`int fn(const Temporal*, const Temporal*)`, so each operator builds
TWO single-instant tgeompoints from event fields (lonA/latA/tsA +
lonB/latB/tsB) before invoking MEOS:

    econtains_tgeo_tgeo    → TemporalEContainsTGeometry
    ecovers_tgeo_tgeo      → TemporalECoversTGeometry
    edisjoint_tgeo_tgeo    → TemporalEDisjointTGeometry
    eintersects_tgeo_tgeo  → TemporalEIntersectsTGeometry
    etouches_tgeo_tgeo     → TemporalETouchesTGeometry
    acontains_tgeo_tgeo    → TemporalAContainsTGeometry
    adisjoint_tgeo_tgeo    → TemporalADisjointTGeometry
    aintersects_tgeo_tgeo  → TemporalAIntersectsTGeometry
    atouches_tgeo_tgeo     → TemporalATouchesTGeometry

The 3-arg dwithin pair (edwithin / adwithin) stays out — same as in
W1/W2, they belong to a separate distance-arg template branch.

Generator extension
-------------------

This is the first PR where the codegen ships a NEW template branch in
addition to new rows. Adds:

  * PHYSICAL_CPP_TEMPLATE_TWO_TEMPORAL_POINTS — mirrors the one-temporal-point
    template, but with two single-instant tgeompoints and no static
    geometry argument.
  * `build_two_temporal_points` boolean flag on operator descriptors,
    dispatched alongside `build_temporal_point` in `emit_operator`.

No existing template paths change. Row totals:

| family | _tgeo_tgeo (2-arg) ops in meos_geo.h | shipped |
|--------|--------------------------------------|---------|
| e/*    | econtains, ecovers, edisjoint, eintersects, etouches | 5/5 |
| a/*    | acontains, adisjoint, aintersects, atouches          | 4/4 |

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [38/38] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [52/52] Linking libnes-logical-operators.a

Both targets link clean on the first attempt — the template extension
worked without iteration, validating the generator approach for the next
shape (distance functions, 3-arg signature).
…mplates)

Closes the public-API distance-function row over (tgeo, geo) and
(tgeo, tgeo). Two distinct measure types, both built from the same
event-field shape used by W1/W2/W3:

Scalar measure — `nad_*` (nearest-approach distance, double return):
    nad_tgeo_geo    → TemporalNADGeometry
    nad_tgeo_tgeo   → TemporalNADTGeometry

Thresholded test — `*dwithin_*` (3-arg, int return):
    edwithin_tgeo_tgeo  → TemporalEDWithinTGeometry
    adwithin_tgeo_geo   → TemporalADWithinGeometry
    adwithin_tgeo_tgeo  → TemporalADWithinTGeometry

`edwithin_tgeo_geo` is already shipped as mariana's `TemporalEDWithinGeometry`
seed, so the (e/a × tgeo_geo/tgeo_tgeo) dwithin square is now complete.

Row totals after this PR (publicly-declared in meos_geo.h):

| shape                 | covered                |
|-----------------------|------------------------|
| nad_tgeo_geo          | 1/1 ✅                |
| nad_tgeo_tgeo         | 1/1 ✅                |
| edwithin_tgeo_geo     | 1/1 (mariana seed) ✅  |
| edwithin_tgeo_tgeo    | 1/1 ✅                 |
| adwithin_tgeo_geo     | 1/1 ✅                 |
| adwithin_tgeo_tgeo    | 1/1 ✅                 |

Generator extension
-------------------

Two new template branches; existing branches untouched:

  * PHYSICAL_CPP_TEMPLATE_TEMPORAL_POINT_WITH_DIST
    — one-tgeo + static geometry + trailing `double dist` (5 args).
  * PHYSICAL_CPP_TEMPLATE_TWO_TEMPORAL_POINTS_WITH_DIST
    — two-tgeo + trailing `double dist` (7 args).

Dispatch in `emit_operator` extends the existing if/elif chain with
`build_temporal_point_with_dist` and `build_two_temporal_points_with_dist`
flags. NAD reuses the existing temporal-point / two-temporal-points
branches with no template change — only `return_type="double"` and
`nautilus_return="FLOAT64"` differ at the operator-descriptor level.

Local verification on the mobilitynebula-v2 dev image:
    cmake --build build-w1 --target nes-physical-operators -j 4
      → [43/43] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [57/57] Linking libnes-logical-operators.a

Both targets link clean on the first attempt.
…0 dispatch cases)

Extends the codegen to back-fill the SQL-parser glue that the W1–W4
PRs (MobilityDB#23MobilityDB#26) shipped without — so the 21 generated operators become
SQL-invokable end-to-end instead of just runtime-registered plugins
waiting for manual wiring.

What the codegen now writes
---------------------------

After emitting the .hpp/.cpp files, the codegen idempotently injects
into the existing in-tree files:

  * nes-sql-parser/AntlrSQL.g4
    - lexer-token entries  (TOKEN: 'TOKEN' | 'token';) bracketed
      with /* BEGIN/END CODEGEN LEXER TOKENS */ marker
    - functionName: alternation list updated with new tokens
  * nes-sql-parser/src/AntlrSQLQueryPlanCreator.cpp
    - #include <Functions/Meos/XxxLogicalFunction.hpp> per op
    - case AntlrSQLLexer::TOKEN: { ... } dispatch block per op,
      bracketed with /* BEGIN/END CODEGEN PARSER GLUE: TOKEN */
  * nes-{logical,physical}-operators/src/Functions/Meos/CMakeLists.txt
    - add_plugin(NebulaName {Logical,Physical}Function ...) per op

Idempotency: every per-op injection skips when either the codegen
marker is present OR a pre-existing hand-written case (no marker) is
already in the file. Re-running the codegen on the same input is a
no-op for the parser side; only the .hpp/.cpp emitters re-write
deterministically.

Two opt-out CLI flags:
    --no-parser-glue     skip .g4 + parser .cpp injection
    --no-cmake-entries   skip CMakeLists.txt injection

Four dispatch-case templates by shape
-------------------------------------

  * one tgeo + static geom        (4 args:  lon, lat, ts, geom)
  * two tgeos                     (6 args:  lonA, latA, tsA, lonB, latB, tsB)
  * one tgeo + static geom + dist (5 args:  lon, lat, ts, geom, dist)
  * two tgeos + dist              (7 args:  lonA, latA, tsA, lonB, latB, tsB, dist)

The constantBuilder→functionBuilder lift mirrors mariana's pattern
from TGEO_AT_STBOX and EDWITHIN_TGEO_GEO (TRUE/FALSE → BOOLEAN,
strtod-clean → FLOAT64, else → VARSIZED), so distance literals and
WKT literals deserialize the same way the hand-written ops do.

Back-fill: 20 new dispatch cases + 21 includes + 20 lexer tokens
----------------------------------------------------------------

Ran the codegen against the combined W1+W2+W3+W4 input (21 ops). One
of the 21 (TEMPORAL_EINTERSECTS_GEOMETRY) was already wired manually
by mariana so the codegen detected and skipped it; 20 cases injected
clean. nes-sql-parser links green with the regenerated ANTLR lexer +
parser stubs.

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-sql-parser  -j 4 → links clean
    cmake --build build-w1 --target nes-logical-operators  -j 4 → up to date
    cmake --build build-w1 --target nes-physical-operators -j 4 → up to date

What this unlocks
-----------------

The 21 W1–W4 operators are now SQL-invokable end-to-end. From now on,
every codegen PR ships parser glue in-PR by default (per the
`--no-parser-glue` opt-out, which is OFF by default). The path past
the spatial-rel surface (W5 tnumber scalar, W5b extended types, W7
aggregations) inherits the closed loop.
…stests)

First-batch tnumber-shape operators. The MEOS surface for nearest-approach
distance over tnumber types is small (4 publicly-declared ops in meos.h
beyond the TBox-arg variants, which are deferred):

    nad_tfloat_float  → TemporalNADFloatScalar    (3 args: value, ts, scalar)
    nad_tint_int      → TemporalNADIntScalar      (3 args: value, ts, scalar)
    nad_tfloat_tfloat → TemporalNADTFloat         (4 args: vA, tsA, vB, tsB)
    nad_tint_tint     → TemporalNADTInt           (4 args: vA, tsA, vB, tsB)

Single-instant tnumber construction uses MEOS's text constructor
`tfloat_in`/`tint_in` over a per-event WKT string "value@ts", mirroring
the existing tgeompoint pattern (where the WKT is built per record from
event fields and parsed by `temporal_in`). The constructed Temporal* is
freed after the MEOS call.

Generator additions
-------------------

Two new physical-cpp template branches + two new parser-glue dispatch-case
templates, all plumbed through emit_operator's existing flag dispatch:

  * PHYSICAL_CPP_TEMPLATE_TNUMBER_POINT_WITH_SCALAR
    — flag: build_tnumber_point_with_scalar
  * PHYSICAL_CPP_TEMPLATE_TWO_TNUMBER_POINTS
    — flag: build_two_tnumber_points
  * DISPATCH_CASE_TNUMBER_POINT_WITH_SCALAR     (3-arg dispatch)
  * DISPATCH_CASE_TWO_TNUMBER_POINTS            (4-arg dispatch)

Per-op extras in the JSON descriptor parameterize tnumber type (FLOAT64
or INT32) and the MEOS `*_in` constructor:
    "tnumber_value_cpp_type": "double" | "int32_t"
    "scalar_cpp_type":        "double" | "int32_t"
    "tnumber_in_fn":          "tfloat_in" | "tint_in"
    "tnumber_wkt_format":     "{}@{}"  (consumed by fmt::format at runtime)

Codegen anchor fix
------------------

The parser-dispatch anchor regex tuned for the pre-W4.5 layout
(TGEO_AT_STBOX → default:) no longer matched after W4.5 injected 20
cases between the two. New logic: insert just after the LAST
`/* END CODEGEN PARSER GLUE: ... */` marker if any exist (so successive
codegen runs cluster their cases), else fall back to the original
TGEO_AT_STBOX→default anchor.

Per-shape systests
------------------

Two new .test files in Tests/Functions/ — one per dispatch shape:

  * nad_tfloat_float.test    (one-tnumber + scalar; 3 rows; expected distance)
  * nad_tfloat_tfloat.test   (two-tnumbers; 3 rows; expected distance)

Per the testing-cadence directive: every codegen PR ships at least one
systest per dispatch shape it introduces.

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [47/47] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [61/61] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first attempt — both new template
branches worked without iteration, and the parser-anchor fix is in
the generator so subsequent W5b/W6/W7 inherit it.
…mplate + 1 systest)

First restriction-shape operators. MEOS signature is
`Temporal* fn(const Temporal*, const GSERIALIZED*)` — returns the
clipped Temporal* (non-null if input survives the restriction, null
if clipped to empty).

For per-event single-instant inputs (the codegen's current shape), the
restriction collapses to a filter predicate: 1 if the point survives,
0 if clipped. This mirrors mariana's TemporalAtStBox int-collapse
pattern exactly — see TemporalAtStBoxPhysicalFunction.cpp:90 for the
hand-written precedent (`clipped.get() != nullptr ? 1 : 0`).

Operators
---------

    tgeo_at_geom    → TemporalAtGeometry      (4 args; survives if point inside the geom)
    tgeo_minus_geom → TemporalMinusGeometry   (4 args; survives if point outside the geom)

Honest semantic note
--------------------

Per-event single-instant TEMPORAL_AT_GEOMETRY is **semantically equivalent**
to TEMPORAL_ECONTAINS_GEOMETRY (PR MobilityDB#23), and TEMPORAL_MINUS_GEOMETRY ≡
TEMPORAL_EDISJOINT_GEOMETRY. The restriction ops only add genuinely new
SQL surface when the input tgeompoint is a *sequence* of multiple
instants (W7-territory — windowed aggregations), where clipping produces
a different sequence than the original. Shipped now because:

  1. They round out the SQL surface PostGIS / MobilityDB users expect
     (the `AT`/`MINUS` idiom is standard there).
  2. They exercise the codegen's first restriction-shape template, which
     W7 sequence-aggregated restriction will inherit.
  3. The collapse-to-int return matches mariana's TemporalAtStBox so
     downstream consumers see a consistent shape across at/minus ops.

Generator additions
-------------------

  * PHYSICAL_CPP_TEMPLATE_TEMPORAL_POINT_RESTRICTION
    — calls `Temporal* {meos_call}(...)`, checks non-null, frees, returns int.
    Flag: `build_temporal_point_restriction`.
  * dispatch_case_for() reuses the existing DISPATCH_CASE_ONE_TEMPORAL_POINT
    template — same 4-arg parser shape (lon, lat, ts, geom), only the
    physical-cpp body shape differs (`Temporal*` return vs `int` return).

Per-shape systest
-----------------

Tests/Functions/at_geometry.test exercises TEMPORAL_AT_GEOMETRY: one
point inside a polygon (expect 1), one outside (expect 0).

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [49/49] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [63/63] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first attempt.
@estebanzimanyi estebanzimanyi force-pushed the feat/nebula-codegen-w6-restriction-tgeo-geom branch from 7826e1a to a38b486 Compare May 22, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant