Skip to content

feat: add MiniMax as LLM provider with Guardian threat protection#1

Open
octo-patch wants to merge 13 commits into
OraclesTech:mainfrom
octo-patch:feature/add-minimax-provider
Open

feat: add MiniMax as LLM provider with Guardian threat protection#1
octo-patch wants to merge 13 commits into
OraclesTech:mainfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

Summary

Add MiniMax as a first-class provider integration for Guardian SDK. MiniMax offers powerful LLM models (M2.7, M2.5) through an OpenAI-compatible API at https://api.minimax.io/v1. This provider wraps MiniMax-configured OpenAI clients with the same multi-layer threat detection pipeline used for OpenAI and Anthropic.

Changes

  • New provider: ethicore_guardian/providers/minimax_provider.pyMiniMaxProvider, ProtectedMiniMaxClient, ProtectedChat, ProtectedCompletions, and create_protected_minimax_client() convenience factory
  • Auto-detection: Updated get_provider_for_client() in base_provider.py to detect MiniMax clients by checking base_url for 'minimax'
  • Dependencies: Added minimax optional dependency group in pyproject.toml (uses openai>=1.0.0)
  • Documentation: Added MiniMax provider example and install instructions to README
  • Tests: 30 tests in tests/test_minimax.py — 22 unit tests + 5 integration tests + 3 constant tests

Supported Models

Model Context
MiniMax-M2.7 1M tokens
MiniMax-M2.7-highspeed 1M tokens (fast)
MiniMax-M2.5 204K tokens
MiniMax-M2.5-highspeed 204K tokens (fast)

Usage

import openai
from ethicore_guardian import Guardian, GuardianConfig
from ethicore_guardian.providers.minimax_provider import MiniMaxProvider

guardian = Guardian(config=GuardianConfig(api_key="my-app"))

minimax_client = openai.OpenAI(
    api_key="your-minimax-api-key",
    base_url="https://api.minimax.io/v1",
)

provider = MiniMaxProvider(guardian)
client = provider.wrap_client(minimax_client)

response = client.chat.completions.create(
    model="MiniMax-M2.7",
    messages=[{"role": "user", "content": user_input}]
)

Test plan

  • All 30 unit + integration tests pass with mocked Guardian
  • Verify no regressions in existing OpenAI/Anthropic providers
  • Manual smoke test with real MiniMax API key (optional)

Oracles Technologies LLC and others added 13 commits February 25, 2026 13:17
Community edition (MIT framework + open-core model):
- 5-category threat detection (instructionOverride, jailbreakActivation,
  safetyBypass, roleHijacking, systemPromptLeaks)
- HMAC-SHA256 license key validator (PRO/ENT tiers)
- License-aware PatternAnalyzer and SemanticAnalyzer with 4-step asset
  resolution chain (assets_dir -> ~/.ethicore -> package)
- Async Guardian orchestrator with OpenAI, Anthropic, and Ollama providers
- ML inference engine with graceful heuristic fallback (no torch required)
- Full pytest suite: 100 passing, 61 skipped (require license + asset bundle)
- pyproject.toml build config; wheel verified clean (no proprietary assets)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…framing

Transforms the minimal technical reference into a trust-building document
that leads with the problem (prompt injection shipped without a real defense),
backs it with technical proof (4-layer pipeline, ONNX offline inference,
~15ms p99 latency), and makes the ethical conviction concrete through the
Guardian Covenant framework reference.

Key changes:
- Hero: 'Only' positioning statement + founding insight sentence
- Added 'See It Work' section with attack demo in 4 lines
- Added 'Why Offline Inference Matters' section (key differentiator vs cloud APIs)
- Expanded Community vs Licensed comparison table (30 rows, all categories named)
- Fixed category count: 30 (was incorrectly stated as 25+ in old README)
- Added Guardian Covenant framework reference with link placeholder
- Added Community & Discussions section to activate GitHub Discussions
- Closing line: the conviction sentence that anchors the brand

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…users

Previous version incorrectly framed Guardian SDK as protecting users from AI.
Corrected to the accurate frame: Guardian protects the developer's AI system
(and its integrity, data, and designed behavior) from adversarial attackers
using prompt injection, jailbreaks, and role hijacking.

Key changes:
- Hero: focuses on real-time threat detection and blocking before model context
- Rewrote opening to frame the attack surface and the defender (the developer)
- Added 'What It Defends Against' section listing specific attack vectors
- Guardian Covenant reframed: developer's commitment to defend what they build
- Closing: 'You built something that people rely on. Defend it.'
- Removed all language implying users are protected FROM the AI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ncoding

Critical bug fixes discovered and resolved during full-suite audit:

ONNX / ML layer (ml_inference_engine.py, guardian.py)
- MLInferenceEngine now accepts assets_dir param; SimpleOrchestrator
  passes it correctly — guardian-model.onnx was never being located
- Added ONNX Runtime inference path with calibration gate; falls back
  to heuristics if model outputs >0.4 avg probability on benign inputs
  (model requires retraining before contributing to scoring)
- Fixed unnormalized linguistic features: raw len(text) -> min(1.0, len/500)
  preventing ONNX input saturation

Licensed pattern library (threat_patterns.py)
- Community stub now performs license-aware dynamic loading at import time
- `from ethicore_guardian.data.threat_patterns import THREAT_PATTERNS`
  transparently returns 30-category licensed library when
  ETHICORE_LICENSE_KEY + asset bundle are present; falls back to
  community 5-category stub otherwise
- Fixes: 53 licensed-tier test failures now pass (157/157 total)

Semantic layer (semantic_analyzer.py)
- vocab.json + special_tokens.json confirmed in asset bundle and
  package data; semantic analyzer uses full 30,522-token vocabulary

Scoring and display (guardian.py, threat_detector.py)
- Layer votes display was always showing BLOCK regardless of actual
  analysis result; fixed to threshold-based per-layer decisions
- Cleaned up SimpleOrchestrator analyzer wiring

Windows / encoding (all analyzer files, __init__.py)
- Replaced emoji in print() calls with ASCII equivalents across all
  modules to prevent UnicodeEncodeError on cp1252 terminals

Test suite
- 96/96 community tests pass (no license key required)
- 157/157 licensed tests pass (with ETHICORE_LICENSE_KEY)
- 15/15 attack detection in live demo (100%)
…tegories

Licensed tier (threat_patterns_licensed.py):
- Expanded from 241 patterns / 30 categories → 500 patterns / 51 categories
- Added 21 new attack-vector categories sourced from OWASP LLM Top 10 2025,
  MITRE ATLAS, Anthropic red-team research, Garak probe taxonomy, and the
  PLINY / social-media jailbreak community (X/Twitter, Reddit r/jailbreak):
    crescendoAttack, manyShotJailbreaking, cipherObfuscation,
    authorityImpersonation, sandboxExemption, delimiterInjection,
    outputFormatEscape, persistentPersona, contextWindowFlooding,
    falsePermissionClaim, legalJurisdictionBypass, professionalAuthorityBypass,
    researchExemption, reversePsychology, contrastiveExtraction,
    metaInstructionAttack, memorySeedingAttack, adversarialFormatting,
    goalHijackingChain, negationBypass, plinyStyleJailbreak
- Bulked up 3 thin existing categories (trainingDataExtraction,
  emotionalManipulation, multiTurnSetup) to 10-12 patterns each
- Bulked up 7 underpowered categories (harmfulContentGeneration,
  piiHarvesting, commandInjection, dataExfiltration, urgencyExploit, etc.)
- Semantic fingerprints: 234 → 444 across all 51 categories
- Synced live asset to ~/.ethicore/data/threat_patterns_licensed.py

Scripts:
- scripts/regenerate_embeddings.py: rewrote to be license-aware; resolves
  output path via CLI args > ETHICORE_ASSETS_DIR > ~/.ethicore > package;
  reads licensed fingerprints (444) when ETHICORE_LICENSE_KEY is set
- scripts/retrain_guardian_model.py: new script — generates synthetic
  training data from fingerprint library, trains sklearn MLPClassifier,
  exports to ONNX with correct dense_1_input/dense_4 interface, runs
  calibration gate before writing (Principle 14: Divine Safety)

Community patterns (threat_patterns.py):
- Added 6 gap-fix patterns to instructionOverride, safetyBypassAttempt,
  roleHijacking to close real-world jailbreak gaps

Tests:
- test_phase4_threat_library.py: updated hard count assertions (30→≥51,
  234→≥444, 235→≥500); added v1.2.0 category coverage checks for all 21
  new categories; switched to >= comparisons for forward compatibility

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
retrain_guardian_model.py:
- Full rewrite: 20 000 samples (10k threat / 10k benign) with real
  MiniLM semantic embeddings on every training sample — eliminates the
  feature-starvation bug where only 6/127 features were non-zero
- 600+ benign templates across 6 domains; 75 assistant-phrasing phrases
  oversampled 3x to prevent false positives on 'how can I help' inputs
- 45 hard-negative security-research sentences (labeled BENIGN) prevent
  flagging legitimate AI-safety research
- Fixed self-check: now compares sklearn vs ONNX probabilities using the
  calibration cal_X vectors rather than [0.01]*27 placeholder embeddings
- Fixed ONNX Gather axis bug: explicit 'probabilities' string match
  prevents matching 'label [N]' (1D) instead of 'probabilities [N,2]'
- Fixed Windows console Unicode errors: replaced all checkmark/arrow
  symbols with ASCII [OK]/[FAIL]/[WARN]
- Corrected docstring sample/architecture defaults (20000, 128,64)

.gitignore:
- Added fix_*.py, *.bak, *.tmp to suppress scratch/temp files

Trained model stats (v1.2.0 release):
  Samples:      20 000 | Accuracy: 99.83% | AUC-ROC: 0.9996
  Calibration:  avg benign prob 0.0337 (all 3 texts < 0.11) — PASSED
  Output:       guardian-model.onnx 98 KB (gitignored; in asset bundle)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…res()

Root cause: _build_feature_vector() used text-derived behavioral proxy
features (char_len/500, word_count/100, ...) while extract_features()
uses sentinel defaults ([0.5, 1.0, 0.0, 0.0, ...]) when no behavioral
session data is available.  Features [0] and [1] differed by +0.44 and
+0.93 respectively — the model learned those high sentinel values as
threat-correlated, causing avg_benign_prob=0.990 at MLInferenceEngine
load-time calibration and the engine falling back to heuristics.

Fixes applied to retrain_guardian_model.py:
1. _build_feature_vector() now mirrors extract_features() exactly:
     - Behavioral [0:40]: sentinel defaults [0.5, 1.0, 0.0, 0.0, ...]
     - Linguistic  [40:75]: same 5 text-derived computations as engine
     - Technical   [75:100]: sentinel defaults [0.1, 0.0, ...]
     - Semantic    [100:127]: real MiniLM or [0.01]*27 null placeholder
   Verified: 0 feature mismatches across all test texts.
2. Null-semantic injection: 20% of training samples use [0.01]*27 for
   the semantic slot, teaching the model that null semantic signal
   does not imply threat (handles calibration + SemanticAnalyzer
   unavailable edge cases gracefully).
3. Calibration gate in retrain script now uses [0.01]*27 (matching
   extract_features exactly) instead of hash-based fallback — ensures
   the gate that passes here is the same gate MLInferenceEngine runs.

Result (v1.2.0 final model — hash a9433737b58720c8):
  Accuracy:     95.47%  AUC-ROC: 0.9935
  Calibration:  avg benign prob 0.020 (all 3 texts < 0.03) -- PASSED
  Engine load:  guardian-model-onnx calibration passed (avg: 0.020)
  Smoke test:   3 benign ALLOW, 3 threats BLOCK (ML votes 73-99%)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…omment

_SECRET_MASKED has been populated since v1.2.0 retrain; the "all zeros"
warning comment was leftover setup boilerplate and was factually incorrect.
Removed to prevent confusion for any future reviewer of the source.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove ethicore_guardian/analyzers/ and guardian.py from git tracking
  (core IP — distributed via paid asset bundle only, not open source)
- Gitignore analyzers/, guardian.py, and proprietary test files
  to prevent accidental future commits
- Update README: 51 categories / 500+ patterns / 444+ fingerprints
- Add bi-directional six-layer pipeline docs (pre-flight + post-flight)
- Update Community vs Licensed table with post-flight and learning rows
- Update GuardianConfig reference with Phase 3 parameters

History scrub required: analyzers/ and guardian.py exist in earlier
commits and must be purged with git-filter-repo before history is clean.
Phase 3 release — OutputAnalyzer (post-flight gate) +
AdversarialLearner (closed-loop learning). Adds analyze_response()
public API and GuardianConfig output/learning fields.
- Add 80-entry _BENIGN_CASUAL domain: short greetings, informal requests,
  phrases containing 'benign'/'normal' — the exact inputs that produced
  false positives (ML BLOCK on single-word greetings, 'Hello, help me...')
- Over-sample _BENIGN_CASUAL 4x in _ALL_BENIGN pool; add dedicated 20%
  sampling path in _make_benign_sample() (previously 0% representation)
- Increase default --samples from 20,000 to 30,000 (15k threat / 15k benign)
  to ensure all 51 licensed categories are thoroughly represented
- Adjust _make_benign_sample() weights:
    20% casual (new), 20% assistant phrasing, 15% hard negatives, 45% pool
- Update calibration gate hint and docstring to reference 30,000

Licensed retrain results (51 categories / 444 fingerprints):
  Accuracy: 95.33%  |  AUC-ROC: 0.9940  |  Avg benign prob: 0.0139
  Calibration gate: PASSED
Add MiniMax (https://www.minimax.io) as a first-class provider integration
for Guardian SDK. MiniMax offers powerful LLM models (M2.7, M2.5) through
an OpenAI-compatible API, and this provider wraps MiniMax-configured OpenAI
clients with the same threat detection pipeline used for OpenAI and Anthropic.

Changes:
- Add minimax_provider.py with MiniMaxProvider, ProtectedMiniMaxClient,
  and create_protected_minimax_client() convenience factory
- Add MiniMax auto-detection in get_provider_for_client() via base_url
- Add minimax optional dependency group in pyproject.toml
- Add MiniMax provider example and install instructions to README
- Add 30 tests (22 unit + 5 integration + 3 constant tests)
@OraclesTech
Copy link
Copy Markdown
Owner

Hey @octo-patch, really appreciate you taking the time to contribute to the Guardian SDK! It's clear you read the codebase carefully: the proxy structure, async/sync handling, and docstrings all follow the existing patterns closely, and the mock-based test suite is on par. Great first PR.

Before we merge, there are two things that need to be addressed:

guardian.py was not updated
get_provider_for_client() will correctly identify a MiniMax client, but guardian.wrap() then looks up self.providers['minimax'], which doesn't exist because _setup_providers() was never updated to register the new provider. This will raise a KeyError at runtime, making the provider unreachable through the public API. You'll need to add the registration there alongside the existing OpenAI/Anthropic entries.

Fail-open on empty prompt (minimax_provider.py, line 283)
The if prompt_text and prompt_text.strip(): guard means if extract_prompt() returns None or an empty string, due to a malformed payload or an unusual message format, threat analysis is skipped entirely and the request passes through unblocked. This is fail-open, which goes against the core security contract of Guardian SDK (Principle 14). The existing providers don't have this escape hatch. Either raise on an empty prompt or treat it as a challenge, but never silently allow.

Two smaller things worth a follow-up (not blocking):

The base_url string match works here since it's nested inside the OpenAI module check, but a comment explaining the reasoning would help future maintainers.
The attribute-copying loop in ProtectedCompletions.init diverges from the lazy getattr delegation pattern used in the OpenAI and Anthropic providers... worth aligning for consistency.
Fix the two blocking issues and this is good to go. Thanks again for the contribution brother, looking forward to adding MiniMax support into Guardian SDK!

P.S sorry this took a little bit getting back to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants