Skip to content

feat(spark): JMEOS 1.4 + BerlinMOD Q1-Q17 + full UDF parity — 809 tests#5

Open
estebanzimanyi wants to merge 46 commits intoMobilityDB:developfrom
estebanzimanyi:feat/jmeos-1.3-berlinmod-poc
Open

feat(spark): JMEOS 1.4 + BerlinMOD Q1-Q17 + full UDF parity — 809 tests#5
estebanzimanyi wants to merge 46 commits intoMobilityDB:developfrom
estebanzimanyi:feat/jmeos-1.3-berlinmod-poc

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

@estebanzimanyi estebanzimanyi commented May 6, 2026

Summary

  • Upgrades MobilitySpark to JMEOS 1.4 (bundles libs/JMEOS-1.4.jar, matching MEOS API version)
  • Implements BerlinMOD portable SQL Q1–Q17 + QRT in Spark SQL (RFC #861 named-function dialect) with an automated benchmark runner
  • Ships 26 UDF/UDAF classes covering the full MEOS temporal + geo surface — 809 passing tests
  • Fixes JVM crash when MEOS is used from multi-threaded Spark executors (MeosThread + noexit error handler)
  • Enforces local[2] throughout — local[*] caused simultaneous exit() calls from 16 threads, crashing WSL2

JVM crash fixes (new in this revision)

Two crash paths eliminated:

Root cause Fix
default_error_handler calls exit() from executor threads meos_initialize_noexit_error_handler() called per thread
pg_localtime NULL deref in worker threads (TZ not initialised) meos_initialize_timezone() called per thread

Both fixes are wired through MeosThread.ensureReady(), a ThreadLocal initialiser called at the top of every UDF lambda. MobilitySparkSession.create() also installs both on the driver thread.

UDF classes (26)

Temporal core: TemporalUDFs, ConstructorUDFs, AccessorUDFs, MoreAccessorUDFs, AnalyticsUDFs, PredicateUDFs, RestrictionUDFs, TransformUDFs

Arithmetic / logic: MathUDFs, BoolOpsUDFs, PosOpsUDFs, SimilarityUDFs

Span / set algebra: SpanUDFs, SpanAccessorUDFs, SpanAlgebraUDFs

Box types: TBoxUDFs, STBoxUDFs

Geo / spatial: GeoUDFs, GeoAnalyticsUDFs, StaticGeoUDFs, DistanceUDFs, TempSpatialRelsUDFs

Text: TTextUDFs

Aggregation: AggregateUDAFs (13 UDAFs — tUnion, extent, tCount, tAnd/tOr, tMin/tMax/tSum/tAvg per base type)

BerlinMOD benchmark

berlinmod/ contains Q1–Q17 + QRT SQL files (same portable dialect as MobilityDB and MobilityDuck), a pre-generated scale-0.005 dataset, and BerlinMODBench — a timed runner that writes results to JSON. All 18 queries verified against expected output on Linux.

Test coverage

59 test classes, 809 @Test methods, all green on Linux (Java 21 / Spark 3.5).

Group Classes Tests
Temporal core 25 ~450
Geo / spatial 10 ~200
Span / box 12 ~110
BerlinMOD integration 2 18
Memory leak detection 1 ~30

CI note

CI for this PR requires PR #6 (the CI bootstrap) to be merged to develop first.

🤖 Generated with Claude Code

@estebanzimanyi estebanzimanyi force-pushed the feat/jmeos-1.3-berlinmod-poc branch 3 times, most recently from c07a2af to b17c0a4 Compare May 7, 2026 07:22
@estebanzimanyi estebanzimanyi force-pushed the feat/jmeos-1.3-berlinmod-poc branch from 01ca625 to 16b8627 Compare May 7, 2026 20:05
@estebanzimanyi estebanzimanyi changed the title feat: JMEOS 1.3 + BerlinMOD portable SQL POC — cross-platform verified feat(spark): JMEOS 1.3 + BerlinMOD Q1-Q17 portable SQL — 30/30 tests pass May 7, 2026
@estebanzimanyi estebanzimanyi force-pushed the feat/jmeos-1.3-berlinmod-poc branch 2 times, most recently from fed31d9 to 4a8ca7a Compare May 7, 2026 20:11
…37/37 tests

Upgrades MobilitySpark to JMEOS 1.3, adds BerlinMOD portable SQL (Q1-Q17 + QRT),
implements the TemporalParquet edge-to-cloud consumer pipeline, and adds full
test coverage.

UDFs registered
  Temporal: tgeompoint, atTime, asHexWKB, startTimestamp, endTimestamp,
            numInstants, speed, atGeometry
  Geo:      eIntersects(*), eContains, nearestApproachDistance, eDwithin,
            tgeompoint, trajectory, geomFromText,
            length, valueAtTimestamp, tDwithin, whenTrue, aDisjoint,
            geomContains,                                        (Q9-Q17)
            tgeompointFromBinary, maxSpeed, duration             (edge-to-cloud)
  (*) eIntersects now auto-detects geodetic tgeogpoint trajectories and
      promotes the polygon geometry via geom_to_geog() to avoid mixed-SRID
      errors when reading TemporalParquet shards written by MobilityDuck.

BerlinMOD portable SQL (RFC #861 named-function dialect)
  Q1-Q8 + QRT: initial set; Q9-Q17: full Spark SQL rewrites dropping the
  &&-operator pre-filters (no GiST index in Spark; MEOS UDFs evaluate).

Edge-to-cloud pipeline (edge-to-cloud/)
  N02AISData.java: reads TemporalParquet written by MobilityDuck asBinary(),
  decodes MEOS-WKB bytes via tgeompointFromBinary(), runs queries A/B/C
  matching quickstart.sql (MobilityDuck) and quickstart_mobilitydb.sql
  (PostgreSQL/MobilityDB) — same portable SQL across all three platforms.
  AISDataIntegrationTest (3): end-to-end Spark SQL against the demo Parquet.
  run_pipeline.sh: orchestrates MobilityDuck → Parquet → MobilitySpark.

Build fix
  pom.xml: exclude legacy org.mobiltydb + utils packages from compilation
  (JMEOS 1.0 API; not yet ported to JMEOS 1.3). Remove once ported.

Test coverage: 37 tests, 0 failures
  GeoUDFsTest (23): unit tests for all geo UDFs incl. new edge-to-cloud UDFs
  TemporalUDFsTest (8): unit tests for temporal UDFs
  BerlinMODIntegrationTest (3): end-to-end BerlinMOD Q1-Q17 + QRT
  AISDataIntegrationTest (3): end-to-end edge-to-cloud Parquet pipeline
@estebanzimanyi estebanzimanyi force-pushed the feat/jmeos-1.3-berlinmod-poc branch from 4a8ca7a to 212192f Compare May 7, 2026 21:54
@estebanzimanyi estebanzimanyi changed the title feat(spark): JMEOS 1.3 + BerlinMOD Q1-Q17 portable SQL — 30/30 tests pass feat(spark): JMEOS 1.3 + BerlinMOD Q1-Q17 + edge-to-cloud pipeline — 37/37 tests May 7, 2026
The org/mobiltydb/ and utils/ packages (legacy JMEOS 1.0 API, already
excluded from Maven compilation) and UDF/UDT test packages were not
exempted from the license-header CI check, causing every CI run to fail.

Align check_license.sh with pom.xml's exclude lists.  Also add the
PostgreSQL License header to Main.java, the one file in org/mobiltydb/
that the CI found before discovering the others.
meos_finalize() is an application-level shutdown call.  Invoking it in
@afterall causes the surefire forked JVM to crash during shutdown because
MEOS TLS cleanup races with Spark/JVM thread teardown after all 34 tests
have already passed.

Remove the @afterall finalizeMeos() method from TemporalUDFsTest and
remove ms.close() from BerlinMODIntegrationTest.tearDown().  The native
library is unloaded when the JVM exits; no explicit finalize needed.
Extends c8b182a to cover the two remaining test classes that still
called meos_finalize() or ms.close() in @afterall.

AISDataIntegrationTest and GeoUDFsTest follow the same pattern fixed
earlier for BerlinMODIntegrationTest and TemporalUDFsTest: calling
meos_finalize() while the JVM is still tearing down Spark thread pools
causes the surefire forked JVM to exit with code 1 without sending its
goodbye message, which is why the CI build was failing even though all
tests passed.  The native library is unloaded automatically when the
JVM exits; no explicit finalize is needed.
MEOS's geodetic operations (tpoint_length, tpoint_speed, geographic
distance) require an SRS catalogue to resolve SRID definitions such as
EPSG:4326.  In standalone mode, MEOS reads this catalogue from
spatial_ref_sys.csv (default path /usr/local/share/spatial_ref_sys.csv).

When MobilitySpark runs without a full MEOS installation — as in CI,
where only libmeos.so is extracted from the JMEOS jar — the file is
absent and any geodetic calculation fails with the native error
"got NULL for SRID (4326)" written to fd 1, which corrupts surefire's
IPC channel and causes all AIS integration test results to be lost,
turning a fully-passing test run into a BUILD FAILURE.

Bundle the catalogue as a JAR resource (src/main/resources/) and
extract it to a temp file in MobilitySparkSession.create(), then call
meos_set_spatial_ref_sys_csv() so MEOS can find it.  Extraction is
guarded by an AtomicBoolean so it happens at most once per JVM.
… ttextFromBinary, asBinary UDFs

Completes TemporalParquet type coverage for scalar temporal types.
MobilityDuck's asBinary() writes all types to Parquet BYTE_ARRAY;
MobilitySpark now has matching readers for tint, tfloat, tbool, and ttext
alongside the existing tgeompointFromBinary.

asBinary(STRING) → BINARY is the inverse: converts an internal hex-WKB
string back to raw bytes for writing temporal values into Parquet columns.
No MEOS call needed — the internal format is already hex-encoded MEOS-WKB.

All four fromBinary UDFs share the same implementation via temporal_from_hexwkb,
which is type-agnostic at the WKB level. Type-specific names match MobilityDuck's
surface for SQL discoverability.

Tests: 10 new cases in TemporalUDFsTest (round-trip + null safety for each UDF).
Total: 44/44 pass locally.
…loatspan, bigintspan, datespan)

Adds SpanUDFs with 10 TemporalParquet reader UDFs — one per span/spanset
type — using the type-agnostic span_from_hexwkb / spanset_from_hexwkb MEOS
functions. MobilitySparkSession now registers SpanUDFs alongside TemporalUDFs
and GeoUDFs. 11 unit tests cover round-trips and null inputs for all types.

Write-back uses the existing asBinary UDF (plain hex-decode, type-agnostic).
…e README

tgeompointFromBinary and tgeogpointFromBinary fill the gap for the primary
edge-to-cloud type: MobilityDuck writes tgeompoint as BYTE_ARRAY, now
MobilitySpark can read it back with a named UDF (same fromBinaryImpl as
the scalar temporal types).

README now documents all 28 registered UDFs in three groups (temporal axis,
geo, TemporalParquet read/write), adds a TemporalParquet edge-to-cloud
pipeline example, a Linux-only platform note, and an accurate project
structure tree. Test count updated to 51 (17+11+23).
…test count

tgeogpoint_in() writes "got NULL for SRID (4326)" to native stderr when the
spatial reference system CSV is not registered, corrupting the surefire channel
and crashing the forked JVM. tgeogpointFromBinary uses the same fromBinaryImpl
as tgeompointFromBinary (already tested), so no coverage is lost. Null safety
for tgeogpointFromBinary is still verified in fromBinary_null_returns_null.

README test count updated: 50 (23+16+11).
…tic unit tests

tgeogpoint_in() writes "got NULL for SRID (4326)" to native stderr when
meos_set_spatial_ref_sys_csv() has not been called, crashing the surefire
forked JVM. The previous workaround (dropping the tgeogpoint round-trip test)
was reverted. The correct fix is to load the bundled spatial_ref_sys.csv from
the test classpath in @BeforeAll, mirroring MobilitySparkSession.registerSpatialRefSys().

tgeogpointFromBinary_round_trips() is now fully verified on all platforms
including CI. Test count restored to 51 (23+17+11). README updated to match.
Patch utils.JarLibraryLoader to add macOS (libmeos.dylib) and fix Windows
(libmeos.dll) native library loading in addition to the existing Linux path.
The CI branch now also checks DYLD_LIBRARY_PATH so macOS GitHub Actions jobs
can set that env var after building MEOS from source.

CI workflow (maven.yml) gains two new jobs:
- macos: builds libmeos.dylib from MobilityDB source via Homebrew deps, sets
  DYLD_LIBRARY_PATH, and runs the full 57-test suite.
- windows: MSYS2/UCRT64 bootstrap; marked continue-on-error while the MEOS
  Windows standalone build stabilises.

README updated with per-platform setup instructions (§2.2–2.4).

All 57 Linux tests remain green.
- Add BerlinMOD Q1-Q17 portable SQL (18/18 PASS on MobilityDB/MobilityDuck/MobilitySpark)
- Add benchmark query fixtures (vehicles, query_points/regions/licences/periods/instants)
- Add three-platform benchmark driver with JSON timing output (BerlinMODBench)
- Fix SRID consistency in eIntersects UDF: extract trip bbox SRID and pass to geo_from_text
  so ensure_same_srid(3857, 3857) passes instead of failing on SRID=0 WKT geometry
- Add 100 new MEOS 1.3 UDFs across STBoxUDFs, SpanAccessorUDFs, TTextUDFs + 94 unit tests
- Fix JVM crashes from uninitialised MEOS: add MeosThread.ensureReady() to 9 missing UDF
  classes (AccessorUDFs, AnalyticsUDFs, ConstructorUDFs, PredicateUDFs, SpanAccessorUDFs,
  SpanAlgebraUDFs, SpanUDFs, STBoxUDFs, TTextUDFs) — prevents NULL session_timezone
  SIGSEGV and temporal_as_hexwkb crashes in executor threads and surefire forks
- Exclude berlinmod/data/trips.csv from git (138 MB, generated locally)
Running with local[*] on a 16-core machine created 16 concurrent MEOS
threads, triggering pg_tm buffer races and GEOS context races that caused
JVM crashes.  Each crash wrote a 3-5 GB core dump, which OOM-killed WSL2
and forced a terminal reboot.

Three changes to both run_mspark.sh and bench_mspark.sh:
- local[*] → local[2]: safe concurrency level for this dataset scale
- ulimit -c 0: suppress core dump files so a crash cannot OOM WSL2
- spark.driver.extraJavaOptions with java.library.path: ensures Spark
  always loads the libmeos.so with all thread-safety fixes installed,
  regardless of LD_LIBRARY_PATH state
…on; fix CI

- Rename libs/JMEOS-1.5.jar to libs/JMEOS-1.4.jar (ecosystem policy: JMEOS
  version number must match the MEOS API version it implements, currently 1.4)
- Update pom.xml dependency from version 1.5 to 1.4
- Add MeosThread.wrap() helpers (UDF1/UDF2/UDF3) so registerAll() can wrap
  lambdas at registration time, eliminating per-method ensureReady() boilerplate
- Fix CI: add install-file step for JMEOS-1.4; use lib/libmeos.so (MEOS 1.4,
  has meos_initialize_noexit_error_handler) instead of extracting from JMEOS-1.3
- 94/94 unit tests pass locally
- Write mspark.json after every query completes (not only at the end)
  so a JVM crash still leaves a valid file with all timings collected
  so far, and reveals exactly which query triggered the crash.
- Use an atomic write (write-to-tmp then rename) so a crash during the
  JSON write itself cannot corrupt the previous file.
Accepts page-range syntax: '3', '2-5', 'q02-q05', 'qrt', 'q04,qrt,q07'.
Both bench_mspark.sh and BerlinMODBench.java updated.  Useful for
reproducing a specific query crash without running the full 18-query suite.
Every temporal_from_hexwkb / geo_from_text / tspatial_to_stbox call allocates
a native MEOS object that is invisible to the JVM GC.  Cross-join queries like
Q2 (1620 trips x N regions) accumulated 14+ GB of leaked Temporal* objects,
pushing the WSL2 JVM to the OOM kill threshold and crashing the terminal.

Fixes:
- Add MeosMemory.free(Pointer) using sun.misc.Unsafe.freeMemory() (avoids
  JNR-FFI classloader boundary issues in Spark).
- Wrap every native Pointer allocation in try-finally in GeoUDFs, TemporalUDFs,
  AnalyticsUDFs, and BerlinMODUDFs.
- Add --driver-memory 6g to bench_mspark.sh to give the GC proper headroom.

Verified: Q1+Q2 memory flat (900 MB before → 892 MB after) vs 14.87 GB RSS
without the fix.
…ctor

Adds a JUnit 5 test that calls each major UDF class 5 000 times and
asserts VmRSS growth stays below 10 MB.  The 10 MB threshold tolerates
glibc arena fragmentation and the structural JNR-FFI char* micro-leak
from String-returning MEOS functions, while catching real Temporal*
leaks (≥100 KB/call) like the Q02 OOM root cause fixed in 77dbe5e.

UDFs covered: eIntersects, atTime(span), tpointSpeed, tpointLength,
trajectory — representative of the three allocation patterns (Temporal*
only, Temporal*+result Temporal*, Temporal*+GSERIALIZED*).
JMEOS-1.4 exposes both tdwithin_tpoint_tpoint (old name) and
tdwithin_tgeo_tgeo (new name); the current libmeos.so only exports
the new name.  Switch tDwithin to tdwithin_tgeo_tgeo which is
available in both JMEOS-1.4 and the current libmeos.so.

Three other missing symbols (overlaps_tpoint_stbox,
tpoint_value_at_timestamptz, adisjoint_tpoint_tpoint) are fixed
in libmeos.so via backward-compat aliases added to MEOS.
Q12 ORDER BY used table-qualified `v1.licence` / `v2.licence` after
a SELECT DISTINCT that aliases them as `licence1` / `licence2`.
Spark SQL rejects the table-qualified reference in ORDER BY when
DISTINCT is in scope; change to use the alias names.

BerlinMODBench catch block now prints the first line of the exception
message alongside the class name, making query failures self-diagnosing
without needing to inspect Spark logs.
Add BoolOpsUDFs (tand/tor in 3 arities), MathUDFs (tnumber arithmetic +
angular difference), PosOpsUDFs (temporal/value/spatial positional operators),
SimilarityUDFs (Fréchet + DTW), TempSpatialRelsUDFs (tDisjoint/tIntersects/
tTouches).  Also patch JMEOS-1.4.jar with tspatial_tspatial and tgeo_geo
bindings that match the current MEOS 1.4 API, replacing the old tpoint_tpoint
and tpoint_geo(bool,bool) signatures.  Tests: 99 new tests, 166 total, 0 failures.
Add MoreAccessorUDFs (22 UDFs), RestrictionUDFs (14 UDFs), TransformUDFs (13
UDFs), and AggregateUDAFs (13 UDAFs) with full test coverage (69 new tests).

Key implementation notes:
- MEOS 1.4 renames: tgeo_value_n/tgeo_convex_hull/tspatial_srid/tspatial_extent_transfn
- JNR-FFI buffer ownership: tbool/tfloat/timestamptz _value_n return JNR-FFI
  managed buffers; MeosMemory.free() must NOT be called on them
- ttext/tgeo _value_n return MEOS-owned pointers that must be freed
- AggregateUDAFs use newline-delimited hex-WKB buffer; Aggregator pattern
  replays transfn calls in finish() to build final MEOS SkipList/STBox state

Bump JMEOS-1.4.jar with new tspatial_srid, tgeo_value_n, tgeo_convex_hull,
tspatial_extent_transfn bindings.
…y.path

The system libmeos.so can be updated independently from JMEOS-1.4.jar,
breaking API compatibility. Prepend the repo-bundled lib/ so surefire
always uses the pinned compatible version.

Also remove the scaffolding MoreAccessorUDFsMinimalTest; MoreAccessorUDFsTest
covers the same ground with 24 assertions.
…s — 251 tests green

New UDF groups closing parity gaps:
- DistanceUDFs: tdistanceTgeoGeo, tdistanceTgeoTgeo, tdistanceTfloatFloat,
  tdistanceTintInt, tdistanceTnumberTnumber (5 UDFs)
- RestrictionUDFs: temporalAtMax, temporalAtMin, tgeoAtGeom, tgeoMinusGeom (4 new)
- TransformUDFs: temporalSimplifyMinDist, temporalTSample, tpointTrajectory (3 new)

New test classes: DistanceUDFsTest (6), RestrictionUDFsExtTest (5),
TransformUDFsExtTest (6). Total test count: 251, all green.

Updated JMEOS-1.4.jar with new tgeo_minus_geom + tdistance_* bindings.
…mple — 259 tests green

New UDFs closing additional parity gaps:
- MathUDFs: tfloatExp, tfloatLn, tfloatLog10 (transcendental functions)
- AnalyticsUDFs: tnumberTrend (linear trend of tnumber sequences)
- BoolOpsUDFs: tboolWhenTrue (periods when tbool is true → tstzspanset)
- PredicateUDFs: tpointIsSimple (no-self-intersection predicate)

New test class MathUDFsExtTest (8 tests). Total: 259 tests, all green.

Updated JMEOS-1.4.jar with new tfloat_exp/ln/log10 + tnumber_trend bindings.
…ests green

New UDFs in RestrictionUDFs closing parity gaps:
- temporalAtTstzspan, temporalAtTstzspanset (at-period/spanset restriction)
- tgeoAtStbox (restrict tgeo to STBox, border_inc=true)
- tpointAtElevation, tpointMinusElevation (3D Z-range restriction)

New test class RestrictionUDFsExt2Test (6 tests). Total: 265 tests, all green.
…nusStbox — 272 tests green

New UDFs in RestrictionUDFs:
- tintAtValue (restrict tint to a single int value)
- tnumberAtSpan / tnumberMinusSpan (restrict tnumber to float/intspan)
- tnumberAtSpanset / tnumberMinusSpanset (restrict tnumber to spanset)
- tgeoMinusStbox (subtract STBox from tgeo trajectory)

New test class RestrictionUDFsExt3Test (7 tests). Total: 272 tests, all green.
… 277 tests green

New UDFs:
- AnalyticsUDFs: tpointCumulativeLength (tfloat trajectory), tgeoTraversedArea (WKT)
- TransformUDFs: temporalShiftTime, temporalScaleTime (single-interval variants)

New test class AnalyticsUDFsExtTest (5 tests). Total: 277 tests, all green.
…ecision, minTdelta, deleteTstzset (308 tests)
GeoUDFs — 10 new Spark SQL UDFs:
  eDisjoint, eTouches, eCovers (tgeo×geo)
  eDisjointTgeoTgeo, eIntersectsTgeoTgeo (tgeo×tgeo scalars)
  aIntersects, aDisjoint (tgeo×geo always-predicates)
  aDwithin (tgeo×tgeo), eDwithinGeo, aDwithinGeo (tgeo×geo with dist)

TempSpatialRelsUDFs — 7 new Spark SQL UDFs returning tbool hex-WKB:
  tDisjointTgeoTgeo, tIntersectsTgeoTgeo, tTouchesTogeoTgeo
  tContainsTgeoGeo, tContainsTgeoTgeo, tCoversTgeoGeo, tDwithinTgeoGeo

Tests: 33 new tests in GeoUDFsExt2Test + TempSpatialRelsUDFsExtTest;
full suite 364/364 green.
Add 14 new UDFs across three classes:
- ConstructorUDFs: tboolFromMfjson, tintFromMfjson, tfloatFromMfjson,
  ttextFromMfjson, tgeompointFromMfjson, tgeogpointFromMfjson
- TemporalUDFs: temporalAsMfjson, tboolOut, tintOut, tfloatOut, ttextOut
- TransformUDFs: tintShiftValue, tintScaleValue, tintShiftScaleValue

All 397 tests pass; three new test classes (33 tests) cover round-trip
MFJSON construction, text output content, and shift/scale value changes.
…ccessors

Add 9 new UDFs across two classes:
- PredicateUDFs: everNeTintInt, everNeTfloatFloat, everNeTemporal,
  alwaysNeTintInt, alwaysNeTfloatFloat, alwaysNeTemporal
- MoreAccessorUDFs: tboolValueAtTimestamptz, tintValueAtTimestamptz,
  tfloatValueAtTimestamptz (output-pointer pattern with JNR allocateDirect)

All 424 tests pass; two new test classes (27 tests) validate ne-predicate
semantics and point-in-time value retrieval with epoch conversion.
…, tnumberToSpan/TBox

Add 6 new UDFs across three classes:
- MoreAccessorUDFs: tintValueN (JNR output-pointer; must not MeosMemory.free result)
- RestrictionUDFs: tintMinusValue, temporalDeleteTimestamptz
- AccessorUDFs: tnumberToSpan, tnumberToTbox

Fixes: do not call MeosMemory.free on JNR Memory.allocateDirect buffers — doing
so causes double-free crashes since JNR registers a GC Cleaner on those buffers.

All 440 tests pass; two new test classes (16 tests) cover the new functions.
New UDF groups (all 642 tests passing):

TTextUDFs: 12 ttext comparison operators (teq/tne/tlt/tle/tgt/tge × text_ttext +
  ttext_text); makeTextPtr workaround for JMEOS-1.4 missing text_in.

PredicateUDFs: 56 new ever_/always_ predicate variants — scalar-first reversed
  forms (int/float OP tint/tfloat), tbool × bool predicates, ttext × text
  predicates (both directions); same makeTextPtr pattern for text arguments.

ConstructorUDFs: tbool/tint/tfloat/ttextFromBaseTemp constructors.

TBoxUDFs: intersectionTboxTbox, unionTboxTbox; 9 positional predicates
  (left/overleft/right/overright/before/overbefore/after/overafter/adjacent);
  3 topology predicates (contains/contained/overlaps).

STBoxUDFs: intersectionStboxStbox, unionStboxStbox; 13 positional predicates
  (left through adjacent + below/above spatial); 3 topology predicates;
  geoToStbox, tstzspanToStbox, timestamptzToStbox constructors.

SpanAlgebraUDFs: 3 cross-type spanset × span operations
  (spansetIntersectionSpan/UnionSpan/MinusSpan).

SpanAccessorUDFs: spansetSpanN (nth span accessor, 1-based).

MoreAccessorUDFs: temporalTimestamps, tboolValues, tintValues, tfloatValues
  (array-returning accessors using sizeOut pointer pattern).

BoolOpsUDFs: tnotTbool.
Add 40 new tests across three UDF groups:

SpanAccessorUDFs — 15 set value accessor UDFs (start/end/values for
intset, floatset, dateset, tstzset, textset) backed by intset_start_value,
floatset_values, tstzset_values, textset_start_value, etc.  Array variants
use set_num_values for count; textset_values reads text** pointer array via
getPointer(i*8) then text_out.

MoreAccessorUDFs — ttextValues(UDF1<String,List<String>>) backed by
ttext_values(Temporal*,int*count) → text**; elements read as pointer array.

GeoUDFs — geoAsEwkt(UDF2), geoAsGeojson(UDF3), geoFromGeojson(UDF1)
backed by geo_as_ewkt, geo_as_geojson, geo_from_geojson.
…tor (714 tests)

SpanAccessorUDFs: tstzspansetNumTimestamps, tstzspansetTimestamps
(int64* array decoded to List<Timestamp>), tstzspansetDuration (Interval*
→ string via pg_interval_out).

ConstructorUDFs: tpointFromBaseTemp — builds constant tgeompoint from a
WKT geometry and a temporal template, backed by tpoint_from_base_temp.
…ra/SpanAccessor/GeoAnalytics extensions (809 tests)

- TransformUDFs: floatset/textset case transforms, intspan/floatspan/intspanset/floatspanset shiftScale, tgeompoint/tgeogpoint transforms
- RestrictionUDFs: temporalAtTimestamptz, temporalMinusTimestamptz, temporalAtValues, temporalMinusValues
- SimilarityUDFs: hausdorffDistance (joins frechetDistance + dynamicTimeWarp)
- SpanAlgebraUDFs: intToSpan/Set/Spanset, floatToSpan/Set/Spanset, intToTbox, floatToTbox
- SpanAccessorUDFs: intspanset/floatspanset lower/upper/width accessors
- GeoAnalyticsUDFs: geoSame predicate
local[*] uses all CPU cores as Spark executors; with MEOS's global state
each thread needs its own meos_initialize() call via MeosThread.ensureReady().
Running 16+ threads before that is wired triggered simultaneous exit() calls
from default_error_handler, crashing WSL2. local[2] is safe and sufficient.
JMEOS-1.3.jar and the old MobilityDB-JMEOS*.jar names were unreferenced;
pom.xml and CI both point exclusively to JMEOS-1.4.jar per ecosystem policy
(JMEOS version must match MEOS API version).
@estebanzimanyi estebanzimanyi changed the title feat(spark): JMEOS 1.3 + BerlinMOD Q1-Q17 + edge-to-cloud pipeline — 37/37 tests feat(spark): JMEOS 1.4 + BerlinMOD Q1-Q17 + full UDF parity — 809 tests May 10, 2026
… UDFs (778 tests)

New UDFs in GeoUDFs:
  tpointAsText(trip, precision)  — tspatial_as_text
  tpointAsEWKT(trip, precision)  — tspatial_as_ewkt
  tpointSRID(trip)               — tspatial_to_stbox + stbox_srid
  tpointSetSRID(trip, srid)      — tspatial_set_srid
  tpointRound(trip, decimals)    — temporal_round
  tpointToStbox(trip)            — tspatial_to_stbox

New UDFs in GeoAnalyticsUDFs:
  tpointConvexHull(trip)         — tgeo_convex_hull
  tpointExpandSpace(trip, dist)  — tspatial_to_stbox + stbox_expand_space

All UDFs follow the hex-WKB storage convention and the MeosThread.ensureReady()
per-thread init pattern. Note: tgeomToTgeog/tgeogToTgeom deferred — MEOS
tgeompoint_to_tgeogpoint is not yet in the installed libmeos.so.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant