feat(core): add GuardrailProvider interface for pluggable guardrail implementations#1171
Conversation
…mplementations Adds a provider-agnostic interface that external packages can implement to provide guardrail logic (identity verification, content filtering, trust scoring, etc.) without coupling to VoltAgent internals. Key additions: - GuardrailProvider interface with evaluateInput/evaluateOutput methods - createGuardrailsFromProvider() factory that converts providers into VoltAgent-native InputGuardrail/OutputGuardrail arrays - Full TypeScript types exported from @voltagent/core - 19 tests covering all provider scenarios Design principles: - Optional by default: no provider = existing behavior unchanged - Provider-agnostic: AIP, APort, or custom implementations all work - Zero breaking changes: providers produce standard guardrail arrays - Supports allow/modify/block actions with metadata for observability Closes VoltAgent#1166
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a pluggable GuardrailProvider API and a factory that generates VoltAgent input/output guardrails with handlers that invoke provider evaluations and translate provider decisions into allow/block/modify results, including id/severity/tags composition and fail-closed modify behavior. Changes
sequenceDiagram
rect rgba(200,200,255,0.5)
actor Agent
end
rect rgba(200,255,200,0.5)
participant Handler as Guardrail Handler
end
rect rgba(255,200,200,0.5)
participant Provider as GuardrailProvider
end
Agent->>Handler: evaluate(args: { inputText / outputText, operation, ... })
Handler->>Provider: evaluateInput/evaluateOutput(content, { agentName, operation, direction })
Provider-->>Handler: GuardrailProviderDecision (pass?, action?, message?, modifiedContent?, metadata?)
Handler->>Agent: mapped result -> allow / block / modify (+ metadata, modifiedContent if applicable)
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can validate your CodeRabbit configuration file in your editor.If your editor has YAML language server, you can enable auto-completion and validation by adding |
There was a problem hiding this comment.
2 issues found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/agent/guardrail-provider.ts">
<violation number="1" location="packages/core/src/agent/guardrail-provider.ts:171">
P2: Default provider-derived guardrail IDs are not collision-safe and may be empty-derived, which can cause key collisions when multiple providers are combined.</violation>
<violation number="2" location="packages/core/src/agent/guardrail-provider.ts:198">
P1: Malformed provider decisions with `action:"modify"` but missing `modifiedContent` are silently downgraded to `allow`, creating a fail-open path that can bypass intended guardrail transformations.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/agent/guardrail-provider.ts`:
- Around line 107-110: The evaluateInput and evaluateOutput method signatures in
GuardrailProvider should allow undefined returns to match runtime behavior and
docs; update both signatures (evaluateInput and evaluateOutput in
guardrail-provider.ts) to return GuardrailProviderDecision | undefined or
Promise<GuardrailProviderDecision | undefined> (i.e., include undefined in the
union for both synchronous and Promise return variants) so provider
implementations can legally return undefined at compile time.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 037625ac-3e9e-41b1-a2bb-01d8c0b62390
📒 Files selected for processing (4)
packages/core/src/agent/guardrail-provider.spec.tspackages/core/src/agent/guardrail-provider.tspackages/core/src/agent/index.tspackages/core/src/index.ts
…, undefined return types - P1: Malformed modify decisions (action:'modify' without modifiedContent) now fail closed with block instead of silently downgrading to allow - P2: Empty slugs from provider names get a fallback ID via counter to prevent key collisions when combining multiple providers - Return types for evaluateInput/evaluateOutput now include undefined to match the documented contract and runtime behavior - 5 new tests (24 total), all passing
|
Thanks for the thorough review, @cubic-dev-ai and @coderabbitai. All three issues were valid — pushed fixes: P1 — Fail-closed modify (high confidence, agreed): Malformed P2 — Collision-safe IDs: Empty slugs (e.g. from names like Return type alignment: 5 new tests covering all three fixes, 24 total, all passing. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me verify the pushed fixes quickly. 🏁 Script executed: Length of output: 10383
P1 — Fail-closed modify ✅ P2 — Collision-safe IDs ✅ const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`);The empty-slug fallback is idiomatic and the Return type alignment ✅ evaluateInput?(...): Promise<GuardrailProviderDecision | undefined> | GuardrailProviderDecision | undefined;This now matches both the JSDoc contract and the runtime The adapter pattern is preserved throughout — no changes to the existing guardrail pipeline. The implementation looks solid. 🎉 If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/agent/guardrail-provider.ts">
<violation number="1" location="packages/core/src/agent/guardrail-provider.ts:172">
P2: Default provider-derived guardrail IDs are not unique across providers with the same name, causing collisions when guardrail sets are combined.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| outputGuardrails: OutputGuardrail[]; | ||
| } { | ||
| const baseSlug = slugify(provider.name); | ||
| const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`); |
There was a problem hiding this comment.
P2: Default provider-derived guardrail IDs are not unique across providers with the same name, causing collisions when guardrail sets are combined.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/src/agent/guardrail-provider.ts, line 172:
<comment>Default provider-derived guardrail IDs are not unique across providers with the same name, causing collisions when guardrail sets are combined.</comment>
<file context>
@@ -168,7 +168,8 @@ export function createGuardrailsFromProvider(
} {
- const id = options?.id ?? slugify(provider.name);
+ const baseSlug = slugify(provider.name);
+ const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`);
const severity = options?.severity ?? provider.severity;
const tags = mergeTags(provider.tags, options?.tags);
</file context>
| const id = options?.id ?? (baseSlug || `provider-${nextProviderId()}`); | |
| const id = options?.id ?? `${baseSlug || "provider"}-${nextProviderId()}`; |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/agent/guardrail-provider.ts`:
- Around line 197-223: The current logic treats any decision with pass !== false
as an implicit "allow", causing explicit decisions like { action: "block" } to
be mis-emitted as allow; update the decision handling in the guardrail
evaluation (the block using decision, decision.action, decision.pass,
provider.name and InputGuardrailResult) to treat an explicit decision.action as
authoritative: if decision.action === "modify" follow the existing modify branch
(and require modifiedContent), if decision.action === "block" return pass:
false/action: "block" with message/metadata, and if decision.action === "allow"
return pass: true/action: "allow"; for malformed combinations (e.g., action
present but missing required fields or pass contradicts action) fail closed by
returning block and a clear message referencing provider.name; apply the same
authoritative-action logic to the other analogous branch around the 257-283
region.
- Around line 57-76: Refactor GuardrailProviderDecision into a discriminated
union keyed on the action field instead of using modifiedContent?: unknown;
define three variants: { action?: "allow"; pass: true; message?: string;
metadata?: Record<string,unknown> }, { action: "block"; pass: false; message?:
string; metadata?: Record<string,unknown> }, and { action: "modify"; pass:
boolean; message?: string; metadata?: Record<string,unknown>; modifiedContent:
... } where modifiedContent is typed per use-site (for input guardrails use
string | UIMessage[] | BaseMessage[]; for output guardrails use the generic
TOutput). Update all code that constructs or narrows GuardrailProviderDecision
(including places currently casting modifiedContent) to rely on the discriminant
(action === "modify") so TypeScript enforces the correct modifiedContent type
and remove the unsafe assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d106371f-168d-4f65-9d69-675fbc6c3618
📒 Files selected for processing (2)
packages/core/src/agent/guardrail-provider.spec.tspackages/core/src/agent/guardrail-provider.ts
✅ Files skipped from review due to trivial changes (1)
- packages/core/src/agent/guardrail-provider.spec.ts
| export interface GuardrailProviderDecision { | ||
| /** Whether the content passes the guardrail check. */ | ||
| pass: boolean; | ||
| /** | ||
| * The action to take. | ||
| * - `"allow"` — let the content through (default when pass is true) | ||
| * - `"modify"` — replace the content with `modifiedContent` | ||
| * - `"block"` — reject the content (default when pass is false) | ||
| */ | ||
| action?: GuardrailAction; | ||
| /** Human-readable reason for the decision. */ | ||
| message?: string; | ||
| /** | ||
| * Replacement content when action is `"modify"`. | ||
| * For input guardrails this replaces the user input. | ||
| * For output guardrails this replaces the model output. | ||
| */ | ||
| modifiedContent?: unknown; | ||
| /** Arbitrary metadata attached to the guardrail span for observability. */ | ||
| metadata?: Record<string, unknown>; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Expect to see the concrete modifiedInput/modifiedOutput result shapes
# and the current `modifiedContent` declaration/pass-through sites.
rg -n -C2 'export interface InputGuardrailResult|export interface OutputGuardrailResult|modifiedInput|modifiedOutput|modifiedContent' \
packages/core/src/agent/types.ts \
packages/core/src/agent/guardrail-provider.tsRepository: VoltAgent/voltagent
Length of output: 5173
Refactor GuardrailProviderDecision to use a discriminated union instead of unknown type.
modifiedContent?: unknown forces casting at lines 206 and 266, defeating compile-time type safety. Use a discriminated union keyed on action so that when action === "modify", TypeScript enforces the correct payload type (string | UIMessage[] | BaseMessage[] for input, TOutput for output). This prevents contradictory states and removes type assertions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/agent/guardrail-provider.ts` around lines 57 - 76, Refactor
GuardrailProviderDecision into a discriminated union keyed on the action field
instead of using modifiedContent?: unknown; define three variants: { action?:
"allow"; pass: true; message?: string; metadata?: Record<string,unknown> }, {
action: "block"; pass: false; message?: string; metadata?:
Record<string,unknown> }, and { action: "modify"; pass: boolean; message?:
string; metadata?: Record<string,unknown>; modifiedContent: ... } where
modifiedContent is typed per use-site (for input guardrails use string |
UIMessage[] | BaseMessage[]; for output guardrails use the generic TOutput).
Update all code that constructs or narrows GuardrailProviderDecision (including
places currently casting modifiedContent) to rely on the discriminant (action
=== "modify") so TypeScript enforces the correct modifiedContent type and remove
the unsafe assertions.
Address coderabbitai review: explicit action field now overrides pass boolean.
- action: 'block' blocks even when pass: true (prevents silent allow)
- action: 'allow' allows even when pass: false (action authoritative)
- Consistent fail-closed message includes provider name
- 4 new authoritative-action tests (28 total, all passing)
Fixes the case where { pass: true, action: 'block' } was silently
degraded to 'allow' — a risky failure mode for external guardrail plugins.
|
Addressed the latest review feedback from @coderabbitai: Authoritative action handling — The
This eliminates the risky failure mode where malformed provider decisions could silently pass content through. 4 new tests covering authoritative-action edge cases (28 total, all passing). The collision-safe ID concern is a P2 that I believe is already mitigated by the |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me verify the latest implementation. 🏁 Script executed: Length of output: 11525
Authoritative action handling ✅ The handler now evaluates // Explicit action is authoritative when present.
if (decision.action !== undefined) {
if (decision.action === "block") { ... } // { pass: true, action: "block" } → blocks ✅
if (decision.action === "modify") { ... } // fail-closed if no modifiedContent ✅
// action === "allow" — trust the provider. // { pass: false, action: "allow" } → allows ✅
}
// No explicit action — fall back to pass boolean.This correctly handles all three edge cases you described, and the fallback to One minor design note worth considering: Since
Option 1 is a slightly cleaner API surface for provider implementors, but it's a minor point and either approach works. Collision-safe ID (P2) ✅ Agree that The implementation is solid overall. 🎉 |
|
Great implementation. The provider-agnostic interface is exactly the right pattern. Two observations:
The provider interface is the right abstraction - just needs the tool-call evaluation method alongside input/output to make this deterministic and applicable to developer and enterprise use cases. |
Summary
Adds a provider-agnostic
GuardrailProviderinterface that external packages can implement to provide guardrail logic — identity verification, content filtering, trust scoring, or any custom check — without coupling to VoltAgent internals.This addresses #1166 by enabling a clean plugin model where:
What's Added
GuardrailProviderinterfaceProviders implement
evaluateInput,evaluateOutput, or both. Each returns a decision:allow,modify(with replacement content), orblock.createGuardrailsFromProvider()factoryConverts any
GuardrailProviderinto standard VoltAgentInputGuardrail[]/OutputGuardrail[]arrays:Multiple providers can be combined by spreading their guardrail arrays together.
Custom provider example
Design Decisions
Adapter pattern, not new pipeline. Providers produce standard
InputGuardrail/OutputGuardrailarrays via the factory. No changes to the existing guardrail execution pipeline —resolveGuardrailSets,runInputGuardrails, andrunOutputGuardrailswork unchanged.Text-based evaluation surface. Providers receive plain text (extracted from whatever message format is in use). This keeps the provider interface simple and framework-agnostic while covering the common case.
Sync or async. Provider methods can return decisions synchronously or as promises — the factory handles both.
Observable by default. Provider name, description, severity, and tags flow through to OpenTelemetry spans. Decision metadata is attached to guardrail result metadata.
Tests
19 tests covering:
Files Changed
packages/core/src/agent/guardrail-provider.tspackages/core/src/agent/guardrail-provider.spec.tspackages/core/src/agent/index.tspackages/core/src/index.tsBreaking Changes
None. This is purely additive. Existing guardrail configuration continues to work exactly as before.
Summary by cubic
Adds a provider-agnostic
GuardrailProviderand acreateGuardrailsFromProvider()factory to plug external guardrail logic into the existing pipeline. Adds safety fixes (authoritative action, fail-closed modify, collision-safe IDs); fully opt-in and non-breaking. Addresses #1166.New Features
GuardrailProviderwith optionalevaluateInput/evaluateOutput; decisions support allow/modify/block with message and metadata; methods may returnundefined.createGuardrailsFromProvider()returns standard Input/Output guardrail arrays; supportsid,severity,tags; providers can be combined; works sync or async.@voltagent/core; 28 tests cover directions, actions, context/metadata, overrides, and edge cases.Bug Fixes
actionis authoritative (block overrides pass:true; allow overrides pass:false).modifiedContentnow block (fail-closed) with a clear default message.outputText; align return types to includeundefined.Written for commit 54cb6d1. Summary will update on new commits.
Summary by CodeRabbit
New Features
Tests