Skip to content

feat(core): add GuardrailProvider interface for pluggable guardrail implementations#1171

Open
The-Nexus-Guard wants to merge 3 commits intoVoltAgent:mainfrom
The-Nexus-Guard:feature/guardrail-provider-interface
Open

feat(core): add GuardrailProvider interface for pluggable guardrail implementations#1171
The-Nexus-Guard wants to merge 3 commits intoVoltAgent:mainfrom
The-Nexus-Guard:feature/guardrail-provider-interface

Conversation

@The-Nexus-Guard
Copy link

@The-Nexus-Guard The-Nexus-Guard commented Mar 21, 2026

Summary

Adds a provider-agnostic GuardrailProvider interface that external packages can implement to provide guardrail logic — identity verification, content filtering, trust scoring, or any custom check — without coupling to VoltAgent internals.

This addresses #1166 by enabling a clean plugin model where:

  • Users can implement their own guardrail provider
  • External implementations (AIP, APort, etc.) plug in via the same interface
  • Skipping guardrails entirely works exactly as before — zero breaking changes

What's Added

GuardrailProvider interface

interface GuardrailProvider {
  readonly name: string;
  readonly description?: string;
  readonly severity?: GuardrailSeverity;
  readonly tags?: string[];

  evaluateInput?(content: string, context: GuardrailProviderContext): Promise<GuardrailProviderDecision> | GuardrailProviderDecision;
  evaluateOutput?(content: string, context: GuardrailProviderContext): Promise<GuardrailProviderDecision> | GuardrailProviderDecision;
}

Providers implement evaluateInput, evaluateOutput, or both. Each returns a decision: allow, modify (with replacement content), or block.

createGuardrailsFromProvider() factory

Converts any GuardrailProvider into standard VoltAgent InputGuardrail[] / OutputGuardrail[] arrays:

import { Agent, createGuardrailsFromProvider } from "@voltagent/core";

const provider = new MyGuardrailProvider();
const { inputGuardrails, outputGuardrails } = createGuardrailsFromProvider(provider);

const agent = new Agent({
  name: "guarded-agent",
  inputGuardrails,
  outputGuardrails,
});

Multiple providers can be combined by spreading their guardrail arrays together.

Custom provider example

import { GuardrailProvider } from "@voltagent/core";

class ContentFilterProvider implements GuardrailProvider {
  readonly name = "Content Filter";
  readonly severity = "critical" as const;

  async evaluateInput(content: string) {
    if (containsPII(content)) {
      return { pass: false, message: "Input contains PII" };
    }
    return { pass: true };
  }

  async evaluateOutput(content: string) {
    return {
      pass: true,
      action: "modify" as const,
      modifiedContent: redactSensitiveData(content),
    };
  }
}

Design Decisions

  1. Adapter pattern, not new pipeline. Providers produce standard InputGuardrail/OutputGuardrail arrays via the factory. No changes to the existing guardrail execution pipeline — resolveGuardrailSets, runInputGuardrails, and runOutputGuardrails work unchanged.

  2. Text-based evaluation surface. Providers receive plain text (extracted from whatever message format is in use). This keeps the provider interface simple and framework-agnostic while covering the common case.

  3. Sync or async. Provider methods can return decisions synchronously or as promises — the factory handles both.

  4. Observable by default. Provider name, description, severity, and tags flow through to OpenTelemetry spans. Decision metadata is attached to guardrail result metadata.

Tests

19 tests covering:

  • Empty provider (no methods implemented)
  • Input-only, output-only, and both-direction providers
  • Allow, block, and modify decisions
  • Context passing (agent name, operation, direction)
  • Metadata preservation
  • Synchronous providers
  • Combining multiple providers
  • Options overrides (custom id, severity, additional tags)

Files Changed

File Change
packages/core/src/agent/guardrail-provider.ts New — interface + factory
packages/core/src/agent/guardrail-provider.spec.ts New — 19 tests
packages/core/src/agent/index.ts Export new types + factory
packages/core/src/index.ts Re-export from package root

Breaking Changes

None. This is purely additive. Existing guardrail configuration continues to work exactly as before.


Summary by cubic

Adds a provider-agnostic GuardrailProvider and a createGuardrailsFromProvider() factory to plug external guardrail logic into the existing pipeline. Adds safety fixes (authoritative action, fail-closed modify, collision-safe IDs); fully opt-in and non-breaking. Addresses #1166.

  • New Features

    • GuardrailProvider with optional evaluateInput/evaluateOutput; decisions support allow/modify/block with message and metadata; methods may return undefined.
    • createGuardrailsFromProvider() returns standard Input/Output guardrail arrays; supports id, severity, tags; providers can be combined; works sync or async.
    • Exported types and factory from @voltagent/core; 28 tests cover directions, actions, context/metadata, overrides, and edge cases.
  • Bug Fixes

    • Explicit action is authoritative (block overrides pass:true; allow overrides pass:false).
    • Modify decisions without modifiedContent now block (fail-closed) with a clear default message.
    • Collision-safe IDs when provider name slug is empty; handles empty outputText; align return types to include undefined.

Written for commit 54cb6d1. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Added a pluggable guardrail provider system that generates input and output guardrails, supports allow/block/modify decisions (with fail-closed on malformed modifies), preserves provider metadata, and lets options override/merge severity, tags, and IDs.
  • Tests

    • Added comprehensive tests covering guardrail generation, handler behavior (allow/block/modify and fail-closed cases), metadata merging, ID fallbacks, and multi-provider composition.

…mplementations

Adds a provider-agnostic interface that external packages can implement
to provide guardrail logic (identity verification, content filtering,
trust scoring, etc.) without coupling to VoltAgent internals.

Key additions:
- GuardrailProvider interface with evaluateInput/evaluateOutput methods
- createGuardrailsFromProvider() factory that converts providers into
  VoltAgent-native InputGuardrail/OutputGuardrail arrays
- Full TypeScript types exported from @voltagent/core
- 19 tests covering all provider scenarios

Design principles:
- Optional by default: no provider = existing behavior unchanged
- Provider-agnostic: AIP, APort, or custom implementations all work
- Zero breaking changes: providers produce standard guardrail arrays
- Supports allow/modify/block actions with metadata for observability

Closes VoltAgent#1166
@changeset-bot
Copy link

changeset-bot bot commented Mar 21, 2026

⚠️ No Changeset found

Latest commit: 54cb6d1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69f339c0-898e-4629-b4a0-2f7f00d1fa14

📥 Commits

Reviewing files that changed from the base of the PR and between 5602589 and 54cb6d1.

📒 Files selected for processing (2)
  • packages/core/src/agent/guardrail-provider.spec.ts
  • packages/core/src/agent/guardrail-provider.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/src/agent/guardrail-provider.ts

📝 Walkthrough

Walkthrough

Adds a pluggable GuardrailProvider API and a factory that generates VoltAgent input/output guardrails with handlers that invoke provider evaluations and translate provider decisions into allow/block/modify results, including id/severity/tags composition and fail-closed modify behavior.

Changes

Cohort / File(s) Summary
Guardrail Provider Implementation
packages/core/src/agent/guardrail-provider.ts
New module: exports provider-facing types (GuardrailProvider, GuardrailProviderContext, GuardrailProviderDecision, CreateGuardrailsFromProviderOptions) and createGuardrailsFromProvider() which emits inputGuardrails/outputGuardrails, composes id/severity/tags, invokes evaluateInput/evaluateOutput with context, maps decisions to allow/block/modify, and fail-closes malformed modify results.
Guardrail Provider Tests
packages/core/src/agent/guardrail-provider.spec.ts
New Vitest suite covering guardrail generation and handler behavior: presence/absence of evaluate methods, metadata derivation and overrides (id/severity/tags), decision-to-action mapping, modify validation and error cases, sync/async evaluation, context/argument passing, metadata passthrough, id fallback/collision behavior, and multi-provider composition.
Public API Exports
packages/core/src/agent/index.ts, packages/core/src/index.ts
Re-exports added: createGuardrailsFromProvider and types GuardrailProvider, GuardrailProviderContext, GuardrailProviderDecision, CreateGuardrailsFromProviderOptions from the new module.
sequenceDiagram
  rect rgba(200,200,255,0.5)
    actor Agent
  end
  rect rgba(200,255,200,0.5)
    participant Handler as Guardrail Handler
  end
  rect rgba(255,200,200,0.5)
    participant Provider as GuardrailProvider
  end

  Agent->>Handler: evaluate(args: { inputText / outputText, operation, ... })
  Handler->>Provider: evaluateInput/evaluateOutput(content, { agentName, operation, direction })
  Provider-->>Handler: GuardrailProviderDecision (pass?, action?, message?, modifiedContent?, metadata?)
  Handler->>Agent: mapped result -> allow / block / modify (+ metadata, modifiedContent if applicable)
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through lines of code, 🐇
Stitches guardrails on the safety road.
Providers whisper pass or block,
Or tweak the text—a gentle shock.
Tests hop in to keep the mode.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main feature: introducing a GuardrailProvider interface for pluggable guardrail implementations, which matches the core purpose of the changeset.
Description check ✅ Passed The description is comprehensive and well-structured, covering all key template sections: related issue, new behavior with examples, design decisions, comprehensive test coverage, and explicitly stating no breaking changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/src/agent/guardrail-provider.ts">

<violation number="1" location="packages/core/src/agent/guardrail-provider.ts:171">
P2: Default provider-derived guardrail IDs are not collision-safe and may be empty-derived, which can cause key collisions when multiple providers are combined.</violation>

<violation number="2" location="packages/core/src/agent/guardrail-provider.ts:198">
P1: Malformed provider decisions with `action:"modify"` but missing `modifiedContent` are silently downgraded to `allow`, creating a fail-open path that can bypass intended guardrail transformations.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/agent/guardrail-provider.ts`:
- Around line 107-110: The evaluateInput and evaluateOutput method signatures in
GuardrailProvider should allow undefined returns to match runtime behavior and
docs; update both signatures (evaluateInput and evaluateOutput in
guardrail-provider.ts) to return GuardrailProviderDecision | undefined or
Promise<GuardrailProviderDecision | undefined> (i.e., include undefined in the
union for both synchronous and Promise return variants) so provider
implementations can legally return undefined at compile time.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 037625ac-3e9e-41b1-a2bb-01d8c0b62390

📥 Commits

Reviewing files that changed from the base of the PR and between 7bd1cca and 0bcf14a.

📒 Files selected for processing (4)
  • packages/core/src/agent/guardrail-provider.spec.ts
  • packages/core/src/agent/guardrail-provider.ts
  • packages/core/src/agent/index.ts
  • packages/core/src/index.ts

…, undefined return types

- P1: Malformed modify decisions (action:'modify' without modifiedContent) now
  fail closed with block instead of silently downgrading to allow
- P2: Empty slugs from provider names get a fallback ID via counter to prevent
  key collisions when combining multiple providers
- Return types for evaluateInput/evaluateOutput now include undefined to match
  the documented contract and runtime behavior
- 5 new tests (24 total), all passing
@The-Nexus-Guard
Copy link
Author

Thanks for the thorough review, @cubic-dev-ai and @coderabbitai. All three issues were valid — pushed fixes:

P1 — Fail-closed modify (high confidence, agreed): Malformed action: "modify" without modifiedContent now returns pass: false, action: "block" instead of silently downgrading to allow. This prevents a provider implementation bug from creating a fail-open path. A descriptive error message is included.

P2 — Collision-safe IDs: Empty slugs (e.g. from names like "---") now get a fallback counter-based ID (provider-1, provider-2, etc.) so multiple providers can be combined without key collisions.

Return type alignment: evaluateInput and evaluateOutput signatures now include | undefined to match the documented contract ("Return undefined or { pass: true }") and the runtime handling at lines 196/246.

5 new tests covering all three fixes, 24 total, all passing.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me verify the pushed fixes quickly.


🏁 Script executed:

#!/bin/bash
# Check the guardrail-provider.ts for the three fixes
fd guardrail-provider.ts --exec cat {}

Length of output: 10383


@The-Nexus-Guard: All three fixes are verified in the pushed code — they're clean and correct:

P1 — Fail-closed modify ✅
Both the input and output handlers now gate on decision?.modifiedContent !== undefined before returning a modify result. The malformed path correctly falls through to { pass: false, action: "block" } with a provider-named error message, which is good for observability.

P2 — Collision-safe IDs ✅

const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`);

The empty-slug fallback is idiomatic and the slugify helper already strips leading/trailing dashes before the empty check, so edge cases like "---" or " " are handled cleanly. One minor note worth being aware of: _providerIdCounter is module-level state and is not reset between test runs — this is fine for the fallback's purpose (uniqueness within a session), but test assertions on the exact counter value (e.g. provider-1) may be order-dependent if other tests in the suite also trigger the counter.

Return type alignment ✅

evaluateInput?(...): Promise<GuardrailProviderDecision | undefined> | GuardrailProviderDecision | undefined;

This now matches both the JSDoc contract and the runtime !decision guard, closing the type/implementation gap.

The adapter pattern is preserved throughout — no changes to the existing guardrail pipeline. The implementation looks solid. 🎉


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/src/agent/guardrail-provider.ts">

<violation number="1" location="packages/core/src/agent/guardrail-provider.ts:172">
P2: Default provider-derived guardrail IDs are not unique across providers with the same name, causing collisions when guardrail sets are combined.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

outputGuardrails: OutputGuardrail[];
} {
const baseSlug = slugify(provider.name);
const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Default provider-derived guardrail IDs are not unique across providers with the same name, causing collisions when guardrail sets are combined.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/src/agent/guardrail-provider.ts, line 172:

<comment>Default provider-derived guardrail IDs are not unique across providers with the same name, causing collisions when guardrail sets are combined.</comment>

<file context>
@@ -168,7 +168,8 @@ export function createGuardrailsFromProvider(
 } {
-  const id = options?.id ?? slugify(provider.name);
+  const baseSlug = slugify(provider.name);
+  const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`);
   const severity = options?.severity ?? provider.severity;
   const tags = mergeTags(provider.tags, options?.tags);
</file context>
Suggested change
const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`);
const id = options?.id ?? `${baseSlug || "provider"}-${nextProviderId()}`;
Fix with Cubic

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/agent/guardrail-provider.ts`:
- Around line 197-223: The current logic treats any decision with pass !== false
as an implicit "allow", causing explicit decisions like { action: "block" } to
be mis-emitted as allow; update the decision handling in the guardrail
evaluation (the block using decision, decision.action, decision.pass,
provider.name and InputGuardrailResult) to treat an explicit decision.action as
authoritative: if decision.action === "modify" follow the existing modify branch
(and require modifiedContent), if decision.action === "block" return pass:
false/action: "block" with message/metadata, and if decision.action === "allow"
return pass: true/action: "allow"; for malformed combinations (e.g., action
present but missing required fields or pass contradicts action) fail closed by
returning block and a clear message referencing provider.name; apply the same
authoritative-action logic to the other analogous branch around the 257-283
region.
- Around line 57-76: Refactor GuardrailProviderDecision into a discriminated
union keyed on the action field instead of using modifiedContent?: unknown;
define three variants: { action?: "allow"; pass: true; message?: string;
metadata?: Record<string,unknown> }, { action: "block"; pass: false; message?:
string; metadata?: Record<string,unknown> }, and { action: "modify"; pass:
boolean; message?: string; metadata?: Record<string,unknown>; modifiedContent:
... } where modifiedContent is typed per use-site (for input guardrails use
string | UIMessage[] | BaseMessage[]; for output guardrails use the generic
TOutput). Update all code that constructs or narrows GuardrailProviderDecision
(including places currently casting modifiedContent) to rely on the discriminant
(action === "modify") so TypeScript enforces the correct modifiedContent type
and remove the unsafe assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d106371f-168d-4f65-9d69-675fbc6c3618

📥 Commits

Reviewing files that changed from the base of the PR and between 0bcf14a and 5602589.

📒 Files selected for processing (2)
  • packages/core/src/agent/guardrail-provider.spec.ts
  • packages/core/src/agent/guardrail-provider.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/core/src/agent/guardrail-provider.spec.ts

Comment on lines +57 to +76
export interface GuardrailProviderDecision {
/** Whether the content passes the guardrail check. */
pass: boolean;
/**
* The action to take.
* - `"allow"` — let the content through (default when pass is true)
* - `"modify"` — replace the content with `modifiedContent`
* - `"block"` — reject the content (default when pass is false)
*/
action?: GuardrailAction;
/** Human-readable reason for the decision. */
message?: string;
/**
* Replacement content when action is `"modify"`.
* For input guardrails this replaces the user input.
* For output guardrails this replaces the model output.
*/
modifiedContent?: unknown;
/** Arbitrary metadata attached to the guardrail span for observability. */
metadata?: Record<string, unknown>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Expect to see the concrete modifiedInput/modifiedOutput result shapes
# and the current `modifiedContent` declaration/pass-through sites.
rg -n -C2 'export interface InputGuardrailResult|export interface OutputGuardrailResult|modifiedInput|modifiedOutput|modifiedContent' \
  packages/core/src/agent/types.ts \
  packages/core/src/agent/guardrail-provider.ts

Repository: VoltAgent/voltagent

Length of output: 5173


Refactor GuardrailProviderDecision to use a discriminated union instead of unknown type.

modifiedContent?: unknown forces casting at lines 206 and 266, defeating compile-time type safety. Use a discriminated union keyed on action so that when action === "modify", TypeScript enforces the correct payload type (string | UIMessage[] | BaseMessage[] for input, TOutput for output). This prevents contradictory states and removes type assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/agent/guardrail-provider.ts` around lines 57 - 76, Refactor
GuardrailProviderDecision into a discriminated union keyed on the action field
instead of using modifiedContent?: unknown; define three variants: { action?:
"allow"; pass: true; message?: string; metadata?: Record<string,unknown> }, {
action: "block"; pass: false; message?: string; metadata?:
Record<string,unknown> }, and { action: "modify"; pass: boolean; message?:
string; metadata?: Record<string,unknown>; modifiedContent: ... } where
modifiedContent is typed per use-site (for input guardrails use string |
UIMessage[] | BaseMessage[]; for output guardrails use the generic TOutput).
Update all code that constructs or narrows GuardrailProviderDecision (including
places currently casting modifiedContent) to rely on the discriminant (action
=== "modify") so TypeScript enforces the correct modifiedContent type and remove
the unsafe assertions.

Address coderabbitai review: explicit action field now overrides pass boolean.
- action: 'block' blocks even when pass: true (prevents silent allow)
- action: 'allow' allows even when pass: false (action authoritative)
- Consistent fail-closed message includes provider name
- 4 new authoritative-action tests (28 total, all passing)

Fixes the case where { pass: true, action: 'block' } was silently
degraded to 'allow' — a risky failure mode for external guardrail plugins.
@The-Nexus-Guard
Copy link
Author

Addressed the latest review feedback from @coderabbitai:

Authoritative action handling — The action field now takes priority over pass when both are present:

  • { pass: true, action: "block" } → blocks (was silently degraded to allow)
  • { pass: false, action: "allow" } → allows (action is authoritative)
  • { action: "modify" } without modifiedContent → fail-closed block with clear message

This eliminates the risky failure mode where malformed provider decisions could silently pass content through. 4 new tests covering authoritative-action edge cases (28 total, all passing).

The collision-safe ID concern is a P2 that I believe is already mitigated by the options.id override — providers with identical names can pass a unique id. Happy to add a hash-based suffix if the maintainers prefer deterministic uniqueness.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me verify the latest implementation.


🏁 Script executed:

#!/bin/bash
fd guardrail-provider.ts --exec cat {}

Length of output: 11525


@The-Nexus-Guard: The authoritative action handling implementation is verified and the logic is correct. Here's a quick breakdown of what was confirmed:

Authoritative action handling ✅

The handler now evaluates decision.action !== undefined first, making action fully authoritative when present:

// Explicit action is authoritative when present.
if (decision.action !== undefined) {
  if (decision.action === "block") { ... }        // { pass: true, action: "block" } → blocks ✅
  if (decision.action === "modify") { ... }       // fail-closed if no modifiedContent ✅
  // action === "allow" — trust the provider.     // { pass: false, action: "allow" } → allows ✅
}
// No explicit action — fall back to pass boolean.

This correctly handles all three edge cases you described, and the fallback to pass when no action is present preserves backward compatibility cleanly.


One minor design note worth considering:

Since action is now authoritative when present, the pass: boolean field in GuardrailProviderDecision becomes semantically redundant whenever action is supplied. A provider returning { pass: true, action: "block" } has a misleading pass value that is silently ignored. You might consider:

  1. Making pass optional (pass?: boolean) — since the handler derives the effective pass state from action anyway. This removes the obligation to set a meaningful pass when action is provided.
  2. Or keeping pass required but adding a JSDoc note that action takes precedence when both are present.

Option 1 is a slightly cleaner API surface for provider implementors, but it's a minor point and either approach works.


Collision-safe ID (P2) ✅

Agree that options.id provides an explicit escape hatch for providers with identical names — that's sufficient for the current use case. A hash-based suffix could be a future enhancement if deterministic uniqueness across sessions becomes a requirement.

The implementation is solid overall. 🎉

@uchibeke
Copy link

uchibeke commented Mar 23, 2026

Great implementation. The provider-agnostic interface is exactly the right pattern. Two observations:

  1. This covers input/output content filtering well, but the original issue ([FEAT] Add deterministic pre-action authorization at the tool-call hook level #1166) specifically requested pre-tool-call authorization: intercepting tool_name + tool_input before execution. The current evaluateInput(content: string) can't do this because it receives text, not tool call metadata. Could we add an evaluateTool? (toolName: string, toolInput: Record<string, unknown>, context: GuardrailProviderContext) method?

  2. For reference, DeerFlow's implementation of the same pattern splits it into tool-call authorization (middleware with wrap_tool_call) separate from content filtering. The tool-call hook receives the tool name, arguments, and passport reference, which is what enables policy-based decisions like "block rm -rf but allow ls".

The provider interface is the right abstraction - just needs the tool-call evaluation method alongside input/output to make this deterministic and applicable to developer and enterprise use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants