Skip to content

Conversation

@rcholic
Copy link

@rcholic rcholic commented Jan 12, 2026

Summary

This PR demonstrates the Sentience SDK's AgentRuntime verification system - a declarative approach to agent observability that makes AI agent behavior deterministic, debuggable, and machine-verifiable.

Why This Matters

Traditional AI agents are black boxes: you prompt them, they act, and you hope for the best. When things go wrong, debugging is painful - you're left parsing logs and screenshots trying to understand what the agent saw vs what it did.

Sentience SDK flips this model:

Traditional Agent Sentience-Verified Agent
"Did it work?" → Check screenshots "Did it work?" → assert_done() returns true/false
Debug via logs Debug via structured trace with per-step assertions
Non-deterministic validation Machine-verifiable predicates
Manual QA Automated regression testing

Key Features Showcased

1. Per-Step Assertions - Guardrails During Execution

step_assertions = [
    {"predicate": url_contains("news.ycombinator.com"), "label": "on_hackernews", "required": True},
    {"predicate": exists("role=link text~'Show HN'"), "label": "show_hn_posts_visible"},
    {"predicate": not_exists("text~'Error'"), "label": "no_error_message"},
]
  • required=True: Agent fails fast if this doesn't pass (prevents drift)
  • Semantic selectors: Query by role + text, not brittle CSS selectors
  • Per-step validation: Catch issues immediately, not at the end

2. Declarative Task Completion - Know When You're Done

done_assertion = all_of(
    url_contains("news.ycombinator.com/show"),
    exists("role=link text~'Show HN'"),
)
  • Combinators: all_of(), any_of() for complex conditions
  • Machine-verifiable: No LLM judgment needed - pure predicate evaluation
  • Deterministic: Same state → same result, every time

3. Structured Trace Output - Studio Timeline Integration

trace_dir="traces" # Outputs JSON traces for Sentience Studio

  • Every assertion result is logged with timestamp, label, and outcome
  • Visualize agent execution in Studio timeline
  • Debug failures by inspecting exactly which predicate failed and why

Verification Output

`🔍 Verification Summary:
All assertions passed: True
Required assertions passed: True
Task verified complete: True

Assertion Details (4 total):
✅ on_hackernews (required)
✅ show_hn_posts_visible
✅ no_error_message
✅ task_complete (required)
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants