Skip to content

codegen(meos): generate tier-aware MEOS facade for the full JMEOS 1.4 surface (stacks on #4)#5

Open
estebanzimanyi wants to merge 4 commits into
MobilityDB:mainfrom
estebanzimanyi:codegen/flink-meos-ops
Open

codegen(meos): generate tier-aware MEOS facade for the full JMEOS 1.4 surface (stacks on #4)#5
estebanzimanyi wants to merge 4 commits into
MobilityDB:mainfrom
estebanzimanyi:codegen/flink-meos-ops

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Add a generated, tier-aware Java facade over the full MEOS public API surface, so downstream Flink-side parity work can stop hand-wiring per-operator JMEOS calls and instead consume one mechanical facade per MEOS object-model class (or per public header for free functions).

What is generated

Layer Java class shape Files Methods Source
OO-classified MeosOps<Class> — one per MEOS object-model class 50 751 tools/codegen/codegen-oo.py
Free fns MeosOpsFree<Header> — one per public MEOS header for fns not assigned to any OO class 6 1,346 tools/codegen/codegen-free.py
Shared runtime MeosOpsRuntime (singleton MEOS_AVAILABLE static init across all 56 facades) 1 hand-written, ~30 LOC
Total 57 2,097 77.7% of JMEOS PR #19's full 2,699-method surface

Each emitted method forwards verbatim to functions.GeneratedFunctions.<name>(...) after probing MeosOpsRuntime.MEOS_AVAILABLE (set once per JVM). Each method carries a Javadoc tier marker:

Tier Meaning Flink wiring shape
stateless Pure per-event, no state ScalarFunction / direct call in MapFunction
bounded-state Per-event with bounded per-key state (MEOS handle) ScalarFunction (state in MEOS handle)
windowed Output cardinality changes; needs window AggregateFunction over TUMBLE/HOP
cross-stream Pairwise across streams; needs interval-overlap join CoProcessFunction / IntervalJoin
io-meta I/O, catalog, lifecycle helpers Helper / format clause

Tier breakdown of the 2,097 emitted methods: 804 stateless · 797 bounded-state · 161 windowed · 140 cross-stream · 195 io-meta.

What's not emitted (honest gap)

  • 50 baseline-target methods absent from JMEOS — type-catalog helpers JMEOS deliberately omits (*_basetype, *_type, *_spantype, …)
  • 14 functions in the streaming-relevance baseline's sequence-only tier — inherently non-streamable, marked as honest "cannot satisfy" pending an emission-shape decision
  • 59 ambiguous functions where the mechanical classifier couldn't decide between two defensible tiers — resolution drafted separately as the streamingSemantics facet RFC for MEOS-API

Coexistence with berlinmod.MEOSBridge

MEOSBridge.java (hand-written, BerlinMOD-scoped, introduced on this branch's parent feat/jmeos-bridge-swap) and the generated MeosOps* facades coexist by design:

  • MEOSBridge keeps the per-BerlinMOD-query intent (Haversine fallback, dwithinSegmentMetres, etc.) — high-level, query-shaped.
  • MeosOps* exposes the raw MEOS surface tier-by-tier — low-level, catalog-shaped.

Both share the same MEOS_AVAILABLE discipline (via MeosOpsRuntime) and the same functions.GeneratedFunctions delegation.

How to regenerate

# 1. Regenerate the MEOS-API catalog from MobilityDB headers
git clone --branch feat/object-model https://github.com/MobilityDB/MEOS-API.git
cd MEOS-API && pip install -r requirements.txt
python run.py /path/to/MobilityDB/meos/include /path/to/MobilityDB/mobilitydb/src
# → output/meos-idl.json

# 2. Produce the streaming-relevance baseline (v4 classifier)
# → streaming-relevance-baseline.json

# 3. Extract JMEOS method signatures
jar xf flink-processor/jar/JMEOS.jar functions/GeneratedFunctions.class
javap -p functions.GeneratedFunctions > jmeos_signatures.txt

# 4. Run both generators
python flink-processor/tools/codegen/codegen-oo.py
python flink-processor/tools/codegen/codegen-free.py

Both generators are ~250 LOC, deterministic, audit-by-regeneration. Manifests under tools/codegen/ record per-class / per-header / per-tier breakdowns + absent-from-JMEOS audit.

Stacking

This PR stacks on feat/jmeos-bridge-swap. Additive-only: 57 new Java files + 5 files under tools/codegen/. No existing file is touched (no diff to MEOSBridge.java, Main.java, TrajectoryWindowFunction.java, pom.xml, or jar/JMEOS.jar).

Note on the base branch's current compile state

feat/jmeos-bridge-swap's MEOSBridge.java:116 imports utils.spatial.PointToSegment from JMEOS PR #18's feat/spatial-haversine branch. The recent bundled-jar refresh on this branch (commit 0a57c07, JMEOS PR #19's jmeos-core jar) brought in the 2,699-method functions.GeneratedFunctions surface but did not include PR #18's utils.spatial.* wrappers. As a result, the base-branch mvn compile currently fails on MEOSBridge.java.

This PR's own diff is green in isolation (javac of just org.mobilitydb.flink.meos.* succeeds against the refreshed jar) and green in the full module when the bundled jar is the union of JMEOS PR #19's jmeos-core + PR #18's utils.spatial.* (locally verified: 123 .class files compile clean, including all 57 new MeosOps*).

Recipe to produce the union jar (~2 minutes):

git clone --depth 1 --branch feat/spatial-haversine https://github.com/estebanzimanyi/JMEOS.git /tmp/pr18
javac -cp flink-processor/jar/JMEOS.jar:$(find ~/.m2 -name 'jnr-ffi-2.2.14.jar' | head -1) \
      -d /tmp/pr18-classes \
      /tmp/pr18/jmeos-core/src/main/java/utils/spatial/Haversine.java \
      /tmp/pr18/jmeos-core/src/main/java/utils/spatial/PointToSegment.java
mkdir /tmp/union && cd /tmp/union
jar xf /path/to/refreshed/JMEOS.jar
cp -r /tmp/pr18-classes/utils .
jar cf /path/to/flink-processor/jar/JMEOS.jar .

Once the bundled jar is refreshed with the union, the base branch + this PR compile together cleanly.

…matrix on MobilityFlink

All nine BerlinMOD reference queries × three streaming forms each
(continuous, windowed, snapshot) on MobilityFlink — the complete 27-cell
stream-layers parity-matrix row, locally verified end-to-end with no
external dependencies (no Kafka, no Docker, no MEOS native lib, no
JMEOS call).

Queries:

  Q1  which vehicles have appeared in the stream
  Q2  where is vehicle X at time T
  Q3  which vehicles within d of P at time T
  Q4  which vehicles entered region R, and when
  Q5  pairs of vehicles meeting near point P
  Q6  cumulative distance per vehicle
  Q7  first passage of vehicles through POIs
  Q8  vehicles close to a road segment
  Q9  distance between vehicles X and Y at time T

Each query has three form classes (Q<N>{Continuous,Windowed,Snapshot}Function)
and a companion BerlinMODQ<N>LocalTest driver running the three forms
through a Flink mini-cluster against a hardcoded synthetic corpus.

Spatial predicates today are pure Java — Haversine distance for
point-to-point (Q3, Q5, Q6, Q9), point-in-box for region containment
(Q4), and a planar-projection point-to-line-segment distance (Q8). Each
spatial call site is marked TODO(meos) for migration to the JMEOS
bridge of the corresponding MEOS operator once the in-flight MEOS 1.4
bump signals settled (Q3 edwithin_tgeo_geo; Q4 STBox eintersects; Q5
NAD / edwithin_tgeo_tgeo; Q6 trajectory length; Q7 edwithin_tgeo_geo;
Q8 distance(tgeompoint, geometry(LINESTRING)); Q9 tdistance). Q1 and
Q2 have no spatial predicate.

State patterns exercised:
  - keyed simple flag (Q1)
  - keyed last-known position (Q2, Q8)
  - keyed transition + entry log (Q4)
  - keyed accumulator (Q6)
  - keyed first-passage map (Q7)
  - shared key-by-constant state (Q9 pair-wise, Q5 multi-pair MapState)

Verified output counts (see PR description for the exact-line excerpts):

  Q  | continuous | windowed | snapshot
  ---|------------|----------|---------
  Q1 |          3 |        2 |        9
  Q2 |          7 |        2 |        3
  Q3 |         21 |        2 |        6
  Q4 |          4 |        5 |        9
  Q5 |         14 |        2 |        3   (only pair (100,200) qualifies for our P + radii)
  Q6 |         21 |        6 |        9   (drift corpus; v100=601m, v200=300m, v300=1205m)
  Q7 |          3 |        6 |        9   (3 (vehicle, POI) first-passages; intra-window scope)
  Q8 |         21 |        2 |        6   (same shape as Q3 with segment-distance)
  Q9 |          7 |        2 |        3   (X=100, Y=200; distance 4124m = ~4.1km)

Build verification: mvn clean package green; all nine LocalTests run to
completion (Flink mini-cluster, parallelism=1) producing exactly the
expected output shapes.
… 1.4 MEOSBridge

Introduce MEOSBridge as the runtime spatial-predicate surface for all
BerlinMOD-9 × 3-form streaming cells. The bridge calls into MEOS via
JMEOS 1.4 (geog_dwithin over WGS84 geographies) when libmeos is loadable
and falls back to the pure-Java Haversine / SegmentDistance utilities
when it is not — the fallback path is what the BerlinMODQ*LocalTest
mini-cluster drivers exercise (system property mobilityflink.meos.enabled=false).

- New berlinmod/MEOSBridge.java with the dwithinMetres /
  dwithinSegmentMetres / distanceMetres surface and a fail-soft
  static init that flips MEOS_AVAILABLE to false on UnsatisfiedLinkError.
- All BerlinMOD-9 × 3-form spatial predicates rewritten to call
  MEOSBridge instead of Haversine / SegmentDistance directly. 27 cells,
  one bridge call surface, identical predicate semantics.
- JMEOS.jar updated to the JMEOS#15 regen branch artefact (478 305
  bytes); this is the JMEOS 1.4 regen build that exposes geog_dwithin /
  geom_in / geom_to_geog / edwithin_tgeo_geo / nad_tgeo_geo / tpoint_length.
- aisdata/Main.java and aisdata/TrajectoryWindowFunction.java adapted
  to the JMEOS 1.4 meos_initialize() / meos_initialize_timezone()
  split (the old two-arg meos_initialize(String, error_handler_fn)
  signature is gone in JMEOS#15).
- All nine BerlinMODQ*LocalTest mini-cluster drivers set
  mobilityflink.meos.enabled=false at main() entry so they remain
  green-CI without libmeos.so on the runtime path.
- target/ build artefacts gitignored.

The README's spatial-predicate paragraph is updated to describe the
MEOSBridge route as the production path; the TODO(meos) markers across
the BerlinMOD cells are gone.

Build: mvn clean package -DskipTests green.
Verify: BerlinMODQ{1,3,5,8}LocalTest all finish with FINISHED state
on the mini-cluster fallback path.
…s + extended types + utils.spatial)

Updates the bundled `flink-processor/jar/JMEOS.jar` to a combined build
of JMEOS PR #19 (regen against MEOS-API meos-idl.json, 2,699 methods
including extended types) AND PR #18 (utils.spatial.Haversine +
utils.spatial.PointToSegment wrappers that MEOSBridge.java imports).

Surface delta vs the previous bundled jar:
  - public static methods: 2 699 (was 1 685)
  - utils.spatial.Haversine.distance(lon1, lat1, lon2, lat2) → double
  - utils.spatial.PointToSegment.distance(pLon, pLat, s1Lon, s1Lat, s2Lon, s2Lat) → double
  - tnpoint_ methods: 50
  - tcbuffer / tpose / trgeo: now exposed
  - sha: a5895c9b94…  size: 1,210,863 B

Unblocks the MEOSBridge.java import path (line 116) — previously the
jar shipped PR #19's GeneratedFunctions but not PR #18's utils.spatial,
so base-branch mvn compile was RED.  Both PRs now coalesced into a
single jar built by:

    mvn -pl codegen,jmeos-core compile -Dmaven.test.skip=true
    cd jmeos-core/target/classes && jar cf JMEOS.jar .

Unblocks codegen/flink-meos-ops wedge stacked on this branch.
… surface

Add a generated, tier-aware Java facade over the MEOS public API,
organized as one Java class per MEOS object-model class plus one per
public-MEOS-header for free functions:

- 50 `MeosOps<Class>` classes (751 methods): one per MEOS object-model
  class (TFloat, TInt, TBool, TText, TGeomPoint, TGeogPoint, TCbuffer,
  TNpoint, TPose, TRGeometry, TBox, STBox, Set, Span, SpanSet, …).
- 6 `MeosOpsFree<Header>` classes (1,346 methods): one per public MEOS
  header for functions not assigned to any object-model class
  (MeosOpsFreeCore, MeosOpsFreeGeo, MeosOpsFreeCbuffer, MeosOpsFreeNpoint,
  MeosOpsFreePose, MeosOpsFreeRgeo).
- 1 shared `MeosOpsRuntime` (single `MEOS_AVAILABLE` static-init across
  all 56 facades).

Each emitted method forwards to `functions.GeneratedFunctions.<name>(...)`
after probing the shared `MeosOpsRuntime.MEOS_AVAILABLE` flag. Each
method carries a Javadoc tier marker (stateless / bounded-state /
windowed / cross-stream / io-meta) so consumers know the per-method
wiring shape.

Total emit: 2,097 of JMEOS PR #19's 2,699-method surface (77.7%);
remainder is the JMEOS-deliberately-omitted type-catalog helpers plus
the streaming-relevance-baseline ambiguous (59) and sequence-only (14)
buckets, both surfaced separately for design decisions before emit.

Two generators under flink-processor/tools/codegen/:
- codegen-oo.py: reads JMEOS jar signatures via javap-p +
  streaming-relevance baseline + MEOS object model → emits per-OO-class
  facades.
- codegen-free.py: same shape, but for functions not in the OO model →
  emits per-header facades.

Both are ~250 LOC, deterministic, audit-by-regeneration. Manifests
record provenance (JMEOS method total, baseline target count, emit
count, per-tier breakdown, per-class/per-header method count, sample
of functions absent from JMEOS).

Coexists with the existing berlinmod.MEOSBridge hand-written
BerlinMOD-scoped bridge (high-level, query-shaped); the generated
MeosOps* facades expose the raw MEOS surface tier-by-tier
(low-level, catalog-shaped). Both share the same MEOS_AVAILABLE
discipline and `functions.GeneratedFunctions` delegation.

Stacks on feat/jmeos-bridge-swap; additive-only; touches no existing
file. Locally compile-verified against the union of JMEOS PR #19's
jmeos-core + PR #18's utils.spatial (the latter needed by MEOSBridge,
separately tracked).
@estebanzimanyi estebanzimanyi force-pushed the codegen/flink-meos-ops branch from fad7fed to e5707ac Compare May 21, 2026 11:01
@estebanzimanyi
Copy link
Copy Markdown
Member Author

Coordination confirmation: rebased onto the post-union-jar refresh (6676bbb on Flink / fa70867 on Kafka — the JMEOS PR #19 + PR #18 union jar with utils.spatial.{Haversine,PointToSegment}).

Local verification:

$ mvn -q -DskipTests compile
# Flink: 123 .class files total, of which 57 MeosOps*
# Kafka:  94 .class files total, of which 57 MeosOps*

Full module now compiles green — the codegen wedge sits on top of MEOSBridge.java's utils.spatial.* imports without any inherited compile-red. mergeStateStatus = CLEAN. Ready for review/merge whenever.

Coordination item resolved. Thanks for the union-jar refresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant