evaluate_handoff: path-walk failures should report ERROR, not CAUGHT

## Summary

When `_get_by_json_path` (or whatever resolves the claim's `json_path` against the indexed payload) cannot walk the path, the per-claim verdict comes back as `result: "CAUGHT"` with the parser error in `detail`, e.g.:

```
detail: json_path: 'expected dict at segment "['value'][0]['subject']", got list'
```

A consumer can't tell that apart from a real value mismatch — both look like `result: "CAUGHT"` to anyone reading `per_claim`. That has two downstream costs:

1. **KPI / metrics** — counting "hallucinations caught" inflates the number with false positives every time the LLM picks a path the parser can't handle. Demos and customer dashboards over-report drift.
2. **Heal-loop classifiers** — substitute / reprompt / fail decisions are made on `result == "CAUGHT"`. An evaluator-can't-walk-this verdict triggers the wrong tier, retries that can never converge, and noisy traces.

## Proposal

Emit `result: "ERROR"` (or a new `"EVALUATOR_ERROR"`, whichever fits the existing taxonomy) for path-walk failures, with the same `detail` so debug info is preserved. The two cases that consumers want to disambiguate:

- **CAUGHT** — path resolved, claimed value disagrees with indexed value (real drift)
- **ERROR** — path could not be resolved (evaluator limitation, missing field, malformed path, etc.)

## Reproduction

Any tool whose response is a JSON array at the root, with the LLM emitting Python-bracket-key paths (`['value'][0]['subject']`) or bracketed numeric indexes (`[0].subject`). The SDK's parser bails before reaching the leaf. Currently surfaces as `CAUGHT` instead of `ERROR`.

## Workaround on the consumer side

We're patching this locally in `customer-support-sdk-demo` ([evaluate_node.py](https://github.com/ProvablyAI/customer-support-sdk-demo/blob/main/demo/src/demo/agent/evaluate_node.py) — `_patch_array_path_verdicts`): if the SDK's verdict has an "expected … got list/dict" detail, we re-walk with a more permissive parser, mark verdicts as PASS when our local walk verifies the claim, and as ERROR when it can't. That belongs in the SDK so every consumer doesn't reinvent it.

## Related

Tracks alongside the existing array-indexing limitation in `_get_by_json_path` (which the consumer-side workaround was originally created to bridge). Resolving this report-classification issue is independent of fixing the underlying parser — even a path the SDK genuinely can't walk would be more honestly classified as ERROR than CAUGHT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluate_handoff: path-walk failures should report ERROR, not CAUGHT #24

Summary

Proposal

Reproduction

Workaround on the consumer side

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

evaluate_handoff: path-walk failures should report ERROR, not CAUGHT #24

Description

Summary

Proposal

Reproduction

Workaround on the consumer side

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions