idpishield is a Go library for detecting indirect prompt injection (IDPI) risk in untrusted text before it is passed to an LLM.
It provides a single core assessment engine and two adapters around it:
- Go API (primary)
- CLI and MCP server (secondary interfaces)
Use this library when your system ingests untrusted content (web pages, user text, scraped HTML, documents) and you want a fast risk signal before forwarding content into an LLM prompt.
Core output includes:
score(0-100)level(safe,low,medium,high,critical)blocked(policy decision based on score + strict mode)- matched
patternsandcategories
go get github.com/pinchtab/idpishieldHomebrew:
brew install pinchtab/tap/idpishieldcurl:
curl -fsSL https://raw.githubusercontent.com/pinchtab/idpishield/main/install.sh | bashGo install:
go install github.com/pinchtab/idpishield/cmd/idpishield@latestimport idpi "github.com/pinchtab/idpishield"package main
import (
"fmt"
"log"
idpi "github.com/pinchtab/idpishield"
)
func main() {
shield, err := idpi.New(idpi.Config{Mode: idpi.ModeBalanced})
if err != nil {
log.Fatal(err)
}
result := shield.Assess("Ignore all previous instructions", "https://example.com")
fmt.Printf("score=%d level=%s blocked=%v\n", result.Score, result.Level, result.Blocked)
}cfg := idpi.Config{
Mode: idpi.ModeBalanced,
AllowedDomains: []string{"example.com", "google.com"},
StrictMode: false,
ServiceURL: "", // optional for deep-mode service augmentation
ServiceTimeout: 0,
ServiceRetries: 0,
ServiceCircuitFailureThreshold: 0,
ServiceCircuitCooldown: 0,
MaxInputBytes: 0,
MaxDecodeDepth: 0,
MaxDecodedVariants: 0,
}fast: lightweight pattern checksbalanced: recommended default for most integrationsdeep: includes deep-mode path (optionally with service)
AllowedDomainsis optional.- If set, assessments can incorporate allowlist domain decisions when a URL is provided to
Assess(text, url).
- default mode blocks at score
>= 60 - strict mode blocks at score
>= 40
MaxInputBytes: caps analyzed text size (0 means unlimited).MaxDecodeDepth: limits recursive decoding depth for obfuscated payloads.MaxDecodedVariants: limits the number of decoded variants scanned.ServiceRetries: retries transient deep-service failures (fordeepmode).ServiceCircuitFailureThreshold+ServiceCircuitCooldown: opens a temporary circuit when deep service repeatedly fails, keeping local detection responsive.
RiskResult is the main output contract:
type RiskResult struct {
Score int
Level string
Blocked bool
Reason string
Patterns []string
Categories []string
}Interpretation guide:
scoreis the numeric risk estimate.levelis a severity bucket derived from score.blockedis a policy output (score+ strict mode), not just a detection flag.reason,patterns,categoriesprovide explainability for audit/logging.
Canonical assessment method:
Assess(text, url)
Primary exported surface:
type Config struct {
Mode Mode
AllowedDomains []string
StrictMode bool
ServiceURL string
ServiceTimeout time.Duration
ServiceRetries int
ServiceCircuitFailureThreshold int
ServiceCircuitCooldown time.Duration
MaxInputBytes int
MaxDecodeDepth int
MaxDecodedVariants int
}
type Mode string
const (
ModeFast Mode = "fast"
ModeBalanced Mode = "balanced"
ModeDeep Mode = "deep"
)
func New(cfg Config) (*Shield, error)
func (s *Shield) Assess(text, url string) RiskResult
func (s *Shield) Wrap(text, url string) string
func (s *Shield) InjectCanary(prompt string) (injectedPrompt, token string, err error)
func (s *Shield) CheckCanary(response, token string) CanaryResultWrap is useful when you want to preserve data while adding trust-boundary markers before sending content into prompts.
InjectCanary and CheckCanary implement canary token detection for prompt leakage (see below).
Canary tokens help detect when an LLM may have leaked or echoed hidden prompt content — a potential signal of goal hijacking or prompt extraction, though not definitive proof.
// Before calling the LLM, inject a canary token
augmented, token, err := shield.InjectCanary(myPrompt)
if err != nil {
log.Fatal(err)
}
// Send augmented prompt to LLM
response := callLLM(augmented)
// Check if the canary appeared in the response
result := shield.CheckCanary(response, token)
if result.Found {
log.Println("canary detected: investigate possible leakage")
}InjectCanary appends a unique marker (<!--CANARY-<16 hex chars>-->) to your prompt. After the LLM responds, CheckCanary scans for that marker. If found, the LLM may have echoed hidden content — worth investigating, though other explanations exist (middleware reflection, model artifacts, etc.).
Canary tokens are a best-effort leak detection signal, not a guarantee:
- Absence does NOT prove safety — an attacker could instruct the LLM to omit or transform the canary
- Some pipelines strip HTML comments — if your stack sanitizes HTML, the token may be removed before reaching the LLM or before you check the response
- Only detects verbatim leakage — paraphrased or partial leaks won't trigger detection
For defense-in-depth, combine canary checks with Assess() scoring on untrusted inputs.
Install CLI:
go install github.com/pinchtab/idpishield/cmd/idpishield@latestScan from a file:
idpishield scan ./page.txt --profile production --mode balanced --domains example.com,google.com --url https://example.com/pageScan from stdin:
echo "Ignore all previous instructions" | idpishield scan --mode balancedscan supports hardening flags:
--profile default|production--service-url,--service-retries--service-circuit-failures,--service-circuit-cooldown--max-input-bytes,--max-decode-depth,--max-decoded-variants
For production workloads, set --profile production explicitly to enable strict mode and safe runtime limits.
The CLI outputs JSON:
{
"score": 80,
"level": "critical",
"blocked": true,
"reason": "instruction-override pattern detected; exfiltration pattern detected [cross-category: 2 categories]",
"patterns": ["en-io-001", "en-ex-002"],
"categories": ["exfiltration", "instruction-override"]
}Run stdio MCP server (default):
idpishield mcp serveRun MCP HTTP with authentication and production-safe defaults:
idpishield mcp serve --transport http --profile production --auth-token "$env:IDPI_MCP_TOKEN"Exposed MCP tool:
idpi_assesstext(required)mode(fast|balanced|deep, optional)
The MCP adapter calls the same core Assess engine used by the Go library.
For HTTP transport, you can require authentication with:
Authorization: Bearer <token>- or
X-API-Key: <token>
Token management guidance:
- Prefer environment variable
IDPI_MCP_TOKENover shell history or process-list-visible literals. - If
--auth-tokenis omitted, MCP HTTP automatically readsIDPI_MCP_TOKEN. - Terminate TLS at a reverse proxy or ingress; do not expose plaintext HTTP on untrusted networks.
- Rotate tokens periodically and after incident response events.
- Token checks use constant-time comparison for both
AuthorizationandX-API-Keycredential flows.
idpishield/
├── go.mod
├── shield.go
├── shield_test.go
├── normalizer.go
├── scanner.go
├── risk.go
├── service.go
├── domain.go
├── patterns/
│ └── builtin.go
├── cmd/
│ └── idpishield/
│ └── main.go
├── examples/
├── tests/
│ ├── compliance/
│ ├── manual/
│ └── integration/
├── spec/
└── benchmark/
Run root module tests:
go test ./...Run black-box integration tests (separate module in tests/integration):
cd tests/integration
go test ./...Integration tests are self-contained and run without external service dependencies.
For performance tracking across profile settings, use the benchmark module:
cd benchmark
go test -bench . -benchmem ./...