Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,24 @@ jobs:
- name: Print current working directory
run: pwd

- name: Refresh apt package index
run: sudo apt-get update

# Install GEOS library
- name: Install GEOS
run: sudo apt-get install -y libgeos-dev

# Install PROJ library
- name: Install PROJ
run: sudo apt-get install proj-bin libproj-dev proj-data
run: sudo apt-get install -y proj-bin libproj-dev proj-data

# Install JSON-C library
- name: Install JSON-C
run: sudo apt install libjson-c-dev
run: sudo apt-get install -y libjson-c-dev

# Install GSL (required by MobilityDB CMake's find_package(GSL))
- name: Install GSL
run: sudo apt-get install -y libgsl-dev

# Fetch and install MEOS library
- name: Fetch MEOS sources
Expand Down
Binary file modified jar/JMEOS.jar
Binary file not shown.
134 changes: 134 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# JMEOS regeneration pipeline

These scripts let a maintainer rebuild `src/main/java/functions/functions.java`
from scratch against a MobilityDB checkout. Most JMEOS users never run them;
they consume the pre-built `jar/JMEOS.jar`. Run this when:

- Bumping JMEOS to a newer MEOS release.
- Adding a new public MEOS function and wanting it bound automatically.
- Investigating a wrapper bug surfaced in a downstream consumer
(MobilitySpark, future Java consumers).

## One-liner

```bash
scripts/regenerate.sh /path/to/MobilityDB
mvn package -Dmaven.test.skip=true
```

That re-derives the bindings end-to-end and writes a fresh `jar/JMEOS.jar`.

## What each script does

### `amalgamate_meos_h.sh`

The JMEOS extractor reads exactly **one** file at
`src/main/java/builder/resources/meos.h`. MEOS 1.4 split its public surface
across many headers, so the script concatenates them into one in this order:

```
postgres_ext_defs.in.h (typedefs: Datum, TimestampTz, int64, …)
postgres_int_defs.h
meos.h (core temporal API)
meos_geo.h (spatial + tspatial)
meos_cbuffer.h
meos_npoint.h
meos_pose.h
meos_rgeo.h
```

Then it appends extern decls for symbols that are exported from libmeos.so but
live only in private headers the amalgam excludes on purpose
(`meos_internal*.h`, `temporal/temporal.h`, `temporal/meos_catalog.h`):

| Symbol | Why it's appended |
|---|---|
| `acovers_geo_tgeo`, `acovers_tgeo_geo`, `acovers_tgeo_tgeo` | `acovers_*` family lives only in `meos/src/geo/tgeo_spatialrels.c` |
| `mobilitydb_version`, `mobilitydb_full_version` | `temporal/temporal.h` (private) |
| `temporal_mem_size`, `temptype_basetype` | `temporal/temporal.h`, `temporal/meos_catalog.h` |
| `temporal_values_p`, `set_make_free` | `meos_internal.h` — Datum-typed |
| `tnumber_value_split`, `tnumber_value_time_split`, `tnumber_value_time_boxes`, `tbox_get_value_time_tile` | `meos_internal.h` — Datum + MeosType-typed |

The `Datum → long` and `MeosType → int` lowering live in
`builder/FunctionsGenerator.java` (the `equivalentTypes` map).

### `post_regen_patch.py`

Idempotent post-process for the auto-generated `functions.java`. Fixes two
things the generator gets wrong:

1. **`rtree_search` / `rtree_search_temporal`** — the C signature is
`int foo(in, in, void *query, MeosArray *result)` but the generator's
bool-out heuristic mis-compiles them. Rewritten to a straight delegation
that takes a caller-supplied `Pointer result`.

2. **`bool foo(args, T *result)` vs `bool foo(args, T **result)`** —
the generator emits the same wrapper for both, with a spurious
`getPointer(0)` indirection. For `T **result` (pointer-out, INDIR, 10
cases) this is correct. For `T *result` (value-out, DIRECT, 18 cases)
it turns the value buffer into garbage — a caller's `getDouble(0)`
reads the buffer's address as IEEE bits and crashes
`Unsafe_GetDouble` with SIGSEGV.

The DIRECT/INDIR classification is hand-derived from MEOS C signatures
(see the `DIRECT = {…}` set near the top of the script). Names that
match the broken pattern but are not in `DIRECT` are left untouched
(i.e. correctly INDIR-shaped). The script reports the count of each
class on every run; if a future MEOS bump adds a new bool-out function,
it'll show up in the INDIR count — review it and add to `DIRECT` if
the C signature is `T *result`.

### `regenerate.sh`

End-to-end orchestrator. Invokes, in order:

1. `amalgamate_meos_h.sh <MobilityDB-path>`
2. `mvn compile -q` (so the extractor + generator can run from `target/classes`)
3. `java -cp target/classes builder.FunctionsExtractor`
4. `java -cp target/classes builder.FunctionsGenerator`
5. `python3 scripts/post_regen_patch.py src/main/java/functions/functions.java`

After it finishes, run `mvn package -Dmaven.test.skip=true` to produce the jar.

## Why not wire this into `mvn package` directly

Most JMEOS consumers don't have a MobilityDB source checkout sitting next to
JMEOS — they grab the published artefact and use it. Folding the regen
pipeline into the default Maven lifecycle would force every consumer to
clone MobilityDB, install Python 3, and run a sed-style patcher just to
build a jar that already exists in `jar/JMEOS.jar`. Maintainers who actually
need to regen run `regenerate.sh` explicitly; everyone else's `mvn package`
stays pure-Maven.

## Smoke test

`src/test/java/regen/RegenWrapperSanityTest.java` exercises one DIRECT
(`stbox_xmin → 1.5`) and one INDIR (`ttext_value_n → text_out → "hello"`)
wrapper end-to-end. If the post-regen classifier ever misclassifies a future
MEOS function, one of these two cases fails immediately instead of silently
shipping a wrapper that crashes the JVM at the call site.

The test is skipped by default (matches the existing JMEOS test-suite
convention which keeps surefire skipped because the tests need a runtime
libmeos.so). To run after a regen:

```bash
# Temporarily flip <skipTests>true</skipTests> in pom.xml's surefire config
mvn test -Dtest='regen.RegenWrapperSanityTest'
```

## Adding a new MEOS function — checklist

1. The function exists in a public MEOS header (`meos.h`, `meos_geo.h`, …)
and is exported from libmeos.so. Verify the second part with
`nm -D /usr/local/lib/libmeos.so | grep ' T <name>$'`.
- If the symbol is missing because the C definition uses `inline TYPE`
(without `static`), the C99 linker may have skipped emission. The
fix is on the MEOS side: drop `inline`. See MobilityDB PR #939.
2. Run `scripts/regenerate.sh <MobilityDB-path>`.
3. If the new function has a `T *result` out-param wrapper that the
patcher classified as INDIR, add it to `DIRECT` in
`post_regen_patch.py` and re-run the patcher.
4. `mvn package -Dmaven.test.skip=true`.
5. Add a smoke-test case to `RegenWrapperSanityTest.java` if the function
is consumer-critical.
66 changes: 66 additions & 0 deletions scripts/amalgamate_meos_h.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/env bash
# Build the single-file MEOS header that the JMEOS FunctionsExtractor
# expects at src/main/java/builder/resources/meos.h.
#
# JMEOS's extractor reads exactly one file. MEOS 1.4 split its public
# surface across many headers, so we concatenate them into one. We also
# append a few extern declarations for symbols that MEOS exports from
# libmeos.so but does not declare in any public header (or declares only
# in private headers that we deliberately do not include because their
# Datum / MeosType density would balloon the binding surface and pull in
# half-stable internals).
#
# Usage:
# scripts/amalgamate_meos_h.sh /path/to/MobilityDB
# (writes src/main/java/builder/resources/meos.h)

set -euo pipefail

if [[ $# -lt 1 || ! -d "$1/meos/include" ]]; then
echo "usage: $0 <path-to-MobilityDB-checkout>" >&2
echo " expected: <path>/meos/include/meos.h to exist" >&2
exit 2
fi

MEOS_INCLUDE="$1/meos/include"
OUT="$(dirname "$0")/../src/main/java/builder/resources/meos.h"
mkdir -p "$(dirname "$OUT")"

# Order matters: postgres_ext_defs.in.h carries the typedefs (Datum,
# TimestampTz, int64, …) that the rest reference; the per-type headers
# follow, and meos.h itself ends up after the postgres preamble so the
# public surface is parseable.
cat \
"$MEOS_INCLUDE/postgres_ext_defs.in.h" \
"$MEOS_INCLUDE/postgres_int_defs.h" \
"$MEOS_INCLUDE/meos.h" \
"$MEOS_INCLUDE/meos_geo.h" \
"$MEOS_INCLUDE/meos_cbuffer.h" \
"$MEOS_INCLUDE/meos_npoint.h" \
"$MEOS_INCLUDE/meos_pose.h" \
"$MEOS_INCLUDE/meos_rgeo.h" \
> "$OUT"

# Appended decls — these symbols are exported from libmeos.so but live
# either with no prototype at all (acovers_tgeo_*) or in MEOS private
# headers we exclude on purpose (meos_internal*.h, temporal/temporal.h,
# temporal/meos_catalog.h). Without these lines the JMEOS regen would
# omit them and downstream consumers would have to re-bind via JNR-FFI.
cat >> "$OUT" <<'EOF'
extern int acovers_geo_tgeo(const GSERIALIZED *gs, const Temporal *temp);
extern int acovers_tgeo_geo(const Temporal *temp, const GSERIALIZED *gs);
extern int acovers_tgeo_tgeo(const Temporal *temp1, const Temporal *temp2);
extern char *mobilitydb_version(void);
extern char *mobilitydb_full_version(void);
extern int temporal_mem_size(const Temporal *temp);
extern MeosType temptype_basetype(MeosType type);
extern Datum *temporal_values_p(const Temporal *temp, int *count);
extern Set *set_make_free(Datum *values, int count, MeosType basetype, bool order);
extern Temporal **tnumber_value_split(const Temporal *temp, Datum vsize, Datum vorigin, Datum **bins, int *count);
extern Temporal **tnumber_value_time_split(const Temporal *temp, Datum size, const Interval *duration, Datum vorigin, TimestampTz torigin, Datum **value_bins, TimestampTz **time_bins, int *count);
extern TBox *tnumber_value_time_boxes(const Temporal *temp, Datum vsize, const Interval *duration, Datum vorigin, TimestampTz torigin, int *count);
extern TBox *tbox_get_value_time_tile(Datum value, TimestampTz t, Datum vsize, const Interval *duration, Datum vorigin, TimestampTz torigin, MeosType basetype, MeosType spantype);
EOF

extern_count=$(grep -c '^extern' "$OUT")
echo "wrote $OUT ($(wc -l < "$OUT") lines, ${extern_count} extern decls)"
151 changes: 151 additions & 0 deletions scripts/post_regen_patch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
#!/usr/bin/env python3
"""Post-process JMEOS's auto-generated functions.java.

The FunctionsGenerator emits two patterns that are wrong for some
specific signatures and we patch them here. The script is idempotent:
running it twice is a no-op.

1. rtree_search / rtree_search_temporal — the C signature is
int foo(in, in, void *query, MeosArray *result)
but the generator's heuristic miscompiles this as if it were a
bool-out-param wrapper, producing a malformed Java method that does
not compile. We rewrite both wrappers to a straight delegation that
takes a caller-supplied Pointer for the result.

2. bool foo(in, ..., T *result) versus bool foo(in, ..., T **result)
The generator emits the same wrapper for both:
Pointer result = Memory.allocateDirect(runtime, Long.BYTES);
bool out = MeosLibrary.meos.foo(args, result);
Pointer new_result = result.getPointer(0);
return out ? new_result : null;
For pointer-out (T **result, INDIR) this is correct: the native
call writes the pointer value into the buffer and getPointer(0)
reads it back. For value-out (T *result, DIRECT — e.g.
double *result, int *result, TimestampTz *result, bool *result) it
is wrong: getPointer(0) reads the value as if it were an address.
Callers that did r.getDouble(0) on the returned pointer would crash
in Unsafe_GetDouble with SIGSEGV.

We rewrite the DIRECT cases to return the buffer directly. The
classification below is hand-derived from MEOS C signatures (see
the table in scripts/README.md) — there is no reliable way to
recover it from the Java signature alone.

Run after FunctionsGenerator:
python3 scripts/post_regen_patch.py src/main/java/functions/functions.java
"""

import re
import sys
from pathlib import Path

# Out-param signatures of shape bool foo(args, T *result) — the wrapper
# must return the buffer directly so the caller can read the value with
# r.getDouble(0) / getInt(0) / getLong(0) / getByte(0).
DIRECT = {
"bearing_point_point", # double *result
"bigintset_value_n", # int64 *
"dateset_value_n", # DateADT *
"datespanset_date_n", # DateADT *
"floatset_value_n", # double *
"geom_azimuth", # double *
"intset_value_n", # int *
"tbool_value_n", # bool *
"temporal_timestamptz_n", # TimestampTz *
"tfloat_value_n", # double *
"tint_value_n", # int *
"tpoint_direction", # double *
"tstzset_value_n", # TimestampTz *
"tstzspanset_timestamptz_n", # TimestampTz *
# bbox accessors — also single-value out-params
"stbox_xmin", "stbox_xmax", "stbox_ymin", "stbox_ymax",
"stbox_zmin", "stbox_zmax", "stbox_tmin", "stbox_tmax",
"stbox_tmin_inc", "stbox_tmax_inc",
"tbox_xmin", "tbox_xmax", "tbox_tmin", "tbox_tmax",
"tbox_tmin_inc", "tbox_tmax_inc",
"tbox_xmin_inc", "tbox_xmax_inc",
"tboxfloat_xmin", "tboxfloat_xmax",
"tboxint_xmin", "tboxint_xmax",
}
# Everything else matching the broken pattern is INDIR (T **result —
# Pose **, GSERIALIZED **, text **, …) and is left untouched.

RTREE_PATTERN = re.compile(
r'@SuppressWarnings\("unused"\)\s*'
r'public static int (rtree_search(?:_temporal)?)\(Pointer (\w+), int op, Pointer (\w+)\)\s*\{'
r'[^\}]*\}',
re.DOTALL,
)

OUT_PARAM_PATTERN = re.compile(
r'public static Pointer (\w+)\(([^)]*)\) \{\s*'
r'boolean out;\s*'
r'Runtime runtime = Runtime\.getSystemRuntime\(\);\s*'
r'Pointer result = Memory\.allocateDirect\(runtime, Long\.BYTES\);\s*'
r'out = MeosLibrary\.meos\.\w+\(([^)]*)\);\s*'
r'Pointer new_result = result\.getPointer\(0\);\s*'
r'return out \? new_result : null ;\s*\}',
re.DOTALL,
)


def patch_rtree(content: str) -> tuple[str, int]:
def repl(m):
name = m.group(1)
first_param = m.group(2)
third_param = m.group(3)
return (
f'@SuppressWarnings("unused")\n'
f'\tpublic static int {name}(Pointer {first_param}, int op, '
f'Pointer {third_param}, Pointer result) {{\n'
f'\t\treturn MeosLibrary.meos.{name}({first_param}, op, '
f'{third_param}, result);\n'
f'\t}}'
)
new, n = RTREE_PATTERN.subn(repl, content)
return new, n


def patch_out_params(content: str) -> tuple[str, int, int]:
direct_count = 0
indirect_count = 0

def repl(m):
nonlocal direct_count, indirect_count
name = m.group(1)
params = m.group(2)
call_args = m.group(3)
if name in DIRECT:
direct_count += 1
return (
f'public static Pointer {name}({params}) {{\n'
f'\t\tRuntime runtime = Runtime.getSystemRuntime();\n'
f'\t\tPointer result = Memory.allocateDirect(runtime, 8);\n'
f'\t\tboolean out = MeosLibrary.meos.{name}({call_args});\n'
f'\t\treturn out ? result : null;\n'
f'\t}}'
)
indirect_count += 1
return m.group(0)

new = OUT_PARAM_PATTERN.sub(repl, content)
return new, direct_count, indirect_count


def main() -> int:
if len(sys.argv) != 2:
print(f"usage: {sys.argv[0]} <path-to-functions.java>", file=sys.stderr)
return 2
path = Path(sys.argv[1])
content = path.read_text()
content, rtree_n = patch_rtree(content)
content, direct, indirect = patch_out_params(content)
path.write_text(content)
print(f"rtree wrappers patched: {rtree_n}")
print(f"out-param wrappers DIRECT: {direct}")
print(f"out-param wrappers INDIR kept: {indirect}")
return 0


if __name__ == "__main__":
raise SystemExit(main())
Loading
Loading