- result = rag.flag_hallucinated_content(None, docs, context, backend)
for r, e in zip(result, expected, strict=True): # type: ignore
> assert r == e
E assert {'explanation...end': 31, ...} == {'explanation...end': 31, ...}
E
E Omitting 4 identical items, use -vv to show
E Differing items:
E {'explanation': "This sentence makes a factual claim about the color of purple bumble fish. The provided context states: 'The only type
of fish that is yellow is the purple bumble fish.' This directly supports the claim in the sentence."} != {'explanation': "This sentence makes a factu
al claim about the color of purple bumble fish. The document states 'The only type of fish that is yellow is the purple bumble fish.' This directly su
pports the claim in the sentence."}
E Use -v to get more diff
test/stdlib/components/intrinsic/test_rag.py:266: AssertionError
Our hallucination detection tests in
test_rag.pyare failing. I believe this is because the checks in those tests have fallen out of sync with those in the formatter tests. We should fix them. This might also require loosening the expectations for the output like we've done with citations.example failure: