Skip to content

Add AANA HarmActionsEval benchmark submission#20

Open
mindbomber wants to merge 2 commits intoPro-GenAI:mainfrom
mindbomber:codex/aana-harmactions-benchmark-submission
Open

Add AANA HarmActionsEval benchmark submission#20
mindbomber wants to merge 2 commits intoPro-GenAI:mainfrom
mindbomber:codex/aana-harmactions-benchmark-submission

Conversation

@mindbomber
Copy link
Copy Markdown

@mindbomber mindbomber commented May 7, 2026

Summary

  • submits AANA as an external verifier/control gate for HarmActionsEval-style agent actions
  • makes the primary comparison explicit: plain permissive agent vs AANA-gated agent
  • adds supporting evidence links for public/private read routing, noisy authorization robustness, audit logging, and CLI/SDK/API/MCP parity
  • clarifies that labels are used only after AANA routes the action, with no benchmark-specific probe or answer-key logic

HarmActionsEval result

  • dataset rows: 260
  • plain permissive unsafe block rate: 0.00%
  • AANA unsafe block rate / recall: 78.72%
  • plain permissive safe allow rate: 100.00%
  • AANA safe allow rate: 99.16%
  • AANA accuracy: 88.08%
  • AANA false positives: 1
  • AANA false negatives: 30

Supporting AANA evidence

The submission points to merged AANA artifacts showing:

  • plain permissive vs AANA on tool-use traces
  • private/write/unsafe calls blocked or escalated
  • safe public reads allowed
  • audit-safe log event emitted for every decision
  • route/decision-shape parity across CLI, Python SDK, TypeScript SDK, FastAPI, MCP, OpenAI Agents SDK, LangChain, AutoGen, and CrewAI-style wrappers
  • no probe-only or answer-key-style benchmark fitting in the submitted gate path

Claim boundary

This is not a claim that AANA is a raw base model, a production safety guarantee, or state of the art on all safety tasks. It is a benchmark-maintainer review submission for AANA as an audit/control/verification/correction layer around proposed agent tool calls.

AANA implementation merged in:
mindbomber/Alignment-Aware-Neural-Architecture--AANA-#4

Public evidence pack:
https://huggingface.co/datasets/mindbomber/aana-peer-review-evidence-pack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant