Skip to content

feat(berlinmod): scaffold the full BerlinMOD-9 × 3-form parity matrix on NebulaStream (33 YAMLs, 27/27 cells)#15

Draft
estebanzimanyi wants to merge 1 commit into
MobilityDB:mainfrom
estebanzimanyi:feat/berlinmod-streaming-forms
Draft

feat(berlinmod): scaffold the full BerlinMOD-9 × 3-form parity matrix on NebulaStream (33 YAMLs, 27/27 cells)#15
estebanzimanyi wants to merge 1 commit into
MobilityDB:mainfrom
estebanzimanyi:feat/berlinmod-streaming-forms

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

@estebanzimanyi estebanzimanyi commented May 21, 2026

Draft. Additive. BerlinMOD-9 × 3-form streaming-parity scaffold on MobilityNebula, sibling to the existing SNCB Q-series. 27 of 27 cells (Q1..Q9 × 3 forms) — 18 cells full (the BerlinMOD-Q semantic computed entirely inside NebulaStream) and 9 cells partial (NebulaStream emits the per-window inputs; consumer post-processes for the final answer).

No touch to existing SNCB queries, no MEOS C++ PhysicalFunction changes — strictly new YAMLs in a Queries/berlinmod/ subdirectory plus sample CSV + docs.

Coverage

Q Topic Continuous Windowed Snapshot Form
Q1 "which vehicles have appeared?" full
Q2 "where is vehicle X (= 200) at time T?" full
Q3 "vehicles within 5 km of P?" full
Q4 "vehicles inside region R (polygon)?" full
Q5 "pairs of vehicles meeting near P" partial
Q6 "cumulative distance per vehicle" partial
Q7 "first passage through POIs" full (per-POI fan-out)
Q8 "vehicles within d of road segment (LINESTRING)" full
Q9 "X-Y distance at time T" partial

27 / 27 cells covered. The 9 partial cells are honest about the current NebulaStream surface — they need either a stream-self-join (Q5, Q9) or a temporal_length scalar function (Q6) to graduate to full, both of which are documented as one-PR additions in docs/berlinmod-streaming-forms.md.

Form mapping to NebulaStream windows

Form NebulaStream pattern
continuous SLIDING(time_utc, SIZE 1 SEC, ADVANCE BY 1 SEC)
windowed TUMBLING(time_utc, SIZE 10 SEC)
snapshot TUMBLING(time_utc, SIZE 5 SEC)

What "partial" means concretely

Q What NebulaStream emits today What the consumer computes
Q5 per-(window, vehicle) TEMPORAL_SEQUENCE trajectory for each vehicle near P per-pair distance + meeting predicate
Q6 per-(window, vehicle) TEMPORAL_SEQUENCE trajectory length of trajectory per vehicle
Q9 per-(window, vehicle) TEMPORAL_SEQUENCE filtered to vehicle_id ∈ {100, 200} join the two trajectories, compute X-Y distance

Each partial cell IS a valid runnable NebulaStream query that emits useful BerlinMOD-shaped output; the BerlinMOD-Q final answer is one consumer-side reduction step beyond what NebulaStream returns.

What this PR adds

Path Files Role
Queries/berlinmod/q1..q9_{continuous,windowed,snapshot}.yaml 24 one YAML per (Q, form) for Q1..Q6 and Q8..Q9
Queries/berlinmod/q7_poi{1,2,3}_{continuous,windowed,snapshot}.yaml 9 per-POI fan-out for Q7 (3 POIs × 3 forms)
Input/input_berlinmod.csv 1 Sample data: 3 vehicles × 21 events
docs/berlinmod-streaming-forms.md 1 Per-form definitions, MEOS-operator surface, full-vs-partial annotation, path-to-full per Q

Total: 35 new files, 33 YAMLs validated via python3 yaml.safe_load.

MEOS-side surface consumed (no additions requested)

All predicates use operators already exposed by PR #14 and follow-ups:

Operator YAMLs
edwithin_tgeo_geo(lon, lat, t, geom, d) Q3 × 3 (POINT), Q4 × 3 (POLYGON, d=0.0), Q5 × 3 (POINT), Q7 × 9 (per-POI POINT), Q8 × 3 (LINESTRING)
TEMPORAL_SEQUENCE(lon, lat, t) Q2 × 3, Q5 × 3, Q6 × 3, Q9 × 3

No new PhysicalFunction classes; no C++ changes.

Sibling parity work in the ecosystem

No-touch boundary

  • Existing SNCB Q-series (Queries/Query0..Query5.yaml + Queries/sncb_brake_monitoring.yaml) — untouched
  • nes-physical-operators/src/Functions/Meos/ — untouched
  • nes-physical-operators/src/Aggregation/Function/Meos/ — untouched
  • Build files — untouched
  • Logical source name berlinmod_stream and TCP port 32325 chosen distinct from SNCB's sncb_stream / port 32324 to avoid coexistence conflicts

@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-streaming-forms branch from a5ff0f0 to 6d3f885 Compare May 21, 2026 06:07
@estebanzimanyi estebanzimanyi changed the title feat(berlinmod): scaffold BerlinMOD-Q1..Q4 × 3 forms (12 YAMLs, 12/27 cells) feat(berlinmod): scaffold BerlinMOD-Q1..Q4+Q7 × 3 forms (21 YAMLs, 15/27 cells) on NebulaStream May 21, 2026
… on NebulaStream (33 YAMLs, 27/27 cells)

Additive scaffold for the BerlinMOD-9 × 3 streaming-form parity contract
on MobilityNebula, sibling to the existing SNCB Q-series and matching
the MobilityFlink MobilityDB#3 / MobilityKafka MobilityDB#1 streaming-form definitions.

All 27 cells covered:

  Q1 'which vehicles have appeared'      — full (continuous + windowed + snapshot)
  Q2 'where is vehicle X at time T'      — full
  Q3 'vehicles within 5 km of P'         — full
  Q4 'vehicles inside region R (polygon)'— full
  Q5 'pairs of vehicles meeting near P'  — partial (emit per-vehicle trajectories near P; consumer joins)
  Q6 'cumulative distance per vehicle'   — partial (emit TEMPORAL_SEQUENCE; consumer computes length)
  Q7 'first passage of vehicle through POI' × {POI1, POI2, POI3}
                                          — full (per-POI fan-out)
  Q8 'vehicles within d of LINESTRING'   — full (edwithin_tgeo_geo with LINESTRING geometry)
  Q9 'distance between X and Y at time T'— partial (emit X and Y trajectories; consumer joins)

18 of 27 cells are FULL (the BerlinMOD-Q semantic is computed entirely
inside NebulaStream). 9 cells are PARTIAL — NebulaStream emits the
per-window inputs (trajectory, candidate vehicles) and a consumer
post-processes for the final BerlinMOD-Q answer. The partial pattern
is the natural expression of these queries in NebulaStream's current
SQL surface; the path to FULL is documented per-Q in
docs/berlinmod-streaming-forms.md (a stream-self-join for Q5/Q9, a
temporal_length scalar function for Q6).

Form mapping to NebulaStream windows:

  continuous: SLIDING(time_utc, SIZE 1 SEC, ADVANCE BY 1 SEC)
  windowed:   TUMBLING(time_utc, SIZE 10 SEC)
  snapshot:   TUMBLING(time_utc, SIZE 5 SEC)

MEOS-side surface consumed (already exposed by PR MobilityDB#14 + follow-ups):

  edwithin_tgeo_geo — Q3 (POINT predicate), Q4 (POLYGON, d=0.0),
                      Q5 (POINT predicate), Q7 (per-POI POINT),
                      Q8 (LINESTRING predicate)
  TEMPORAL_SEQUENCE — Q2 / Q5 / Q6 / Q9 (per-window per-vehicle trajectory)

No new MEOS PhysicalFunction classes added; no C++ changes; no SNCB
Q-series modifications. All 33 YAMLs are additive in a new
Queries/berlinmod/ subdirectory.

Add (additions):
  Queries/berlinmod/q1_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q2_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q3_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q4_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q5_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q6_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q7_poi{1,2,3}_{continuous,windowed,snapshot}.yaml (9, full via fan-out)
  Queries/berlinmod/q8_{continuous,windowed,snapshot}.yaml          (3, LINESTRING predicate)
  Queries/berlinmod/q9_{continuous,windowed,snapshot}.yaml          (3, partial)
  Input/input_berlinmod.csv  (sample data: 3 vehicles × 21 events, 14 simulated seconds)
  docs/berlinmod-streaming-forms.md

Validation: every YAML parses cleanly via python3 yaml.safe_load.
Runtime verification gated on the NebulaStream test harness.

Coverage: 27 of 27 cells (100 %), with 18 FULL and 9 PARTIAL annotated
explicitly per Q. Path to FULL for the 9 PARTIAL cells is one
MobilityNebula C++ PhysicalFunction class each (or a NebulaStream
upstream stream-self-join), documented in
docs/berlinmod-streaming-forms.md.
@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-streaming-forms branch from 6d3f885 to 395e364 Compare May 21, 2026 06:40
@estebanzimanyi estebanzimanyi changed the title feat(berlinmod): scaffold BerlinMOD-Q1..Q4+Q7 × 3 forms (21 YAMLs, 15/27 cells) on NebulaStream feat(berlinmod): scaffold the full BerlinMOD-9 × 3-form parity matrix on NebulaStream (33 YAMLs, 27/27 cells) May 21, 2026
@marianaGarcez
Copy link
Copy Markdown
Contributor

These are great additions, happy to review when you're ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants