Skip to content

Port eval error detail and typed Datalog body literals#201

Merged
singaraiona merged 3 commits into
RayforceDB:masterfrom
theaspirational:fix/eval-error-detail
May 15, 2026
Merged

Port eval error detail and typed Datalog body literals#201
singaraiona merged 3 commits into
RayforceDB:masterfrom
theaspirational:fix/eval-error-detail

Conversation

@theaspirational
Copy link
Copy Markdown
Contributor

Summary

  • Preserve detailed eval error messages across worker/thread boundaries instead of reducing failures to a generic query: evaluation failed message.
  • Expose FFI-safe runtime object/vector introspection helpers for host frontends.
  • Add body-side Datalog constant type tracking so typed string/symbol literals can match DATOM-tagged RAY_I64 columns without changing the existing direct compare path.

This ports the fix/eval-error-detail work that previously lived on the archived rayforce2 fork.

Rationale for upstream adoption

  1. Symmetric with existing rule heads. dl_rule_t already has head_const_types[] for the same purpose at head positions (src/ops/datalog.h). The new const_types[] on dl_body_t is the body-side mirror, closing a parallel gap rather than introducing a new abstraction.
  2. Direct compare runs first. The typed comparison is a fallback: existing comparisons against plain RAY_I64 columns and RAY_SYM IDB columns hit the direct equality path unchanged. The tag-stripped compare only fires when the column entry carries a DATOM tag in bits 62-63 and the body literal type matches that tag.
  3. Engine-agnostic, frontend-useful. The engine does not need to know about ray-exomem. The change just says RAY_I64 columns may carry tagged sym IDs in bits 62-63; if a body literal is a string or symbol, also try the tag-stripped compare. This is generally useful for any frontend that disambiguates types in shared I64 columns, a common EAV / RDF / triple-store layout pattern.
  4. Backward-compatible. dl_body_t is zero-initialized via memset, so callers that do not migrate to dl_body_set_const_typed get const_types[]=0; dispatch falls through to the existing direct compare behavior. Existing callers can keep using dl_body_set_const.
  5. Diagnostically clean. Before the fix, every (?e ?a "value") body atom against a DATOM-encoded value column silently returned zero rows: no error, just an empty result. Users could not tell whether their query was broken or their data was empty. The typed fallback removes that class of silent correctness bugs in EAV frontends.

Verification

  • make lib
  • ./rayforce.test -f err
  • ./rayforce.test -f datalog
  • ./rayforce.test -f runtime
  • make clean && make test

Full ASAN suite result:

=== 2404 of 2405 passed (1 skipped, 0 failed) ===

theaspirational and others added 3 commits May 15, 2026 14:17
Three compounding bugs left tokio-worker callers staring at empty error
messages and bare "evaluation failed":

1. eval.c had a file-static `__VM` that shadowed runtime.c's
   `extern _Thread_local ray_vm_t *__VM` from runtime.h. Eval-side writes
   never reached the symbol ray_error_msg() reads. Replace the static
   with a matching extern so both translation units share storage.

2. eval.c sets `__VM = NULL` after `ray_free(vm_block)` at the end of
   every eval. Even with the shadow fixed, ray_error_msg() reads
   `__VM->err.msg` which points into freed memory by the time the FFI
   caller looks. Add a thread-local `ray_last_err_msg[256]` in
   runtime.c that ray_verror writes to in parallel; ray_error_msg
   reads from that buffer, which survives VM teardown.

3. dl_eval swallows inner errors via `prog->eval_err = true;
   ray_error_free(...)`. The outer wrapper at datalog.c always returned
   bare "query: evaluation failed" even when an inner op had stashed a
   useful detail (e.g. ray_error("nyi", ...) from a 9-key distinct).
   Capture ray_error_msg() before dl_program_free runs and weave it
   into the wrapper's error if non-empty.

Verified end-to-end: ray-exomem against the production data dir went
from `rayforce2 err domain: (no msg)` on every query to actually
returning rows. The 9-column fact-with-tx case (which triggered RayforceDB#3)
still hits the n_keys > 8 cap in exec_group; the cap is the real cause
and is best fixed by widening rayforce2's group/distinct path, but the
caller can now at least see the underlying reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 16aba188a4f42552800af394a899a583d3a52adf)
Add five accessor functions for foreign consumers (Rust FFI, etc.) that
need to read object metadata and vector elements without having to
duplicate the ray_data + struct-offset + adaptive-width logic in their
own crate:

* ray_obj_type(v)              — int8_t type tag
* ray_obj_attrs(v)             — uint8_t attrs (RAY_SYM width, etc.)
* ray_vec_get_i64(vec, idx)    — read int64; widens RAY_I32/I16/U8/BOOL
                                 and returns the raw int64 for RAY_I64
                                 / DATE / TIME / TIMESTAMP
* ray_vec_get_f64(vec, idx)    — read double; widens RAY_F32
* ray_vec_get_sym_id(vec, idx) — read sym ID via ray_read_sym so
                                 narrow-SYM columns decode correctly

Bounds-checked against vec->len; return zero on type mismatch or out-of-
range so callers can probe a column without first inspecting metadata.
Lets ray-exomem's typed-facts e2e tests inspect query result columns by
type and decode sym/datom values without re-implementing ray_read_sym's
adaptive-width logic in Rust.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 3e45f5c01a791f4478553c69de254a708338e18c)
The body literal interner in dl_set_body_pos collapsed every quoted-
symbol and string-literal into a plain sym ID, throwing away the source
ray type. dl_filter_eq then compared the constant against column
entries via raw `==`. When a frontend stores DATOM-tagged sym IDs in
RAY_I64 columns — i.e. (0x4000... | sym_id) for STR values and
(0x2000... | sym_id) for SYM values, a common pattern in EAV / triple-
store layouts that disambiguate types in a single shared V column —
the body literal "foo" interns as a plain sym N and never matches any
cell 0x4000... | N. Every such body atom silently returned zero rows:
no error, no diagnostic, just empty results.

Approach: thread the source ray type through to the row-equality
helper.

* dl_body_t gains an int8_t const_types[DL_MAX_ARITY] field, parallel
  to the existing dl_rule_t::head_const_types[]. Zero-initialized via
  memset, so callers that don't migrate get no behavior change.
* dl_body_set_const becomes a wrapper calling new
  dl_body_set_const_typed(..., RAY_I64). dl_set_body_pos records each
  literal's type at intern time (-RAY_I64 -> RAY_I64, non-wildcard
  -RAY_SYM -> RAY_SYM, -RAY_STR -> RAY_STR, evaluated (quote ...)
  results carry their val->type).
* dl_col_eq_row takes the body literal's const_type. For RAY_I64
  columns: try direct compare first (preserves existing behavior for
  plain-int columns and IDB columns built from rule heads with
  constants). On miss, if the cell carries a DATOM tag in bits 62-63
  AND the body literal type matches the tag (STR <-> 0x4000...,
  SYM <-> 0x2000...), compare on the tag-stripped payload. RAY_SYM
  columns and untagged RAY_I64 cells stay on the direct-compare path.
* dl_filter_eq threads const_type through; both call sites (positive
  and negated body atoms) pass body->const_types[c].

The fix is additive: the direct compare runs first, the tag-aware
fallback fires only when both signals — column-side tag bits AND
body-literal-side type — line up. 942 / 943 in-tree tests still pass.

Closes a class of silent-correctness failures for any frontend that
wants to disambiguate types in shared RAY_I64 columns without
preprocessing every body atom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 07363bf00c1ee22264b8feb87bccf437f01acc31)
@singaraiona singaraiona merged commit c580adb into RayforceDB:master May 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants