Add adversarial safety fixtures for orbit agent (brutally honest high orbit startup)

## Summary

Exercise prompt/tool/data poisoning and fail-closed behavior for the repo's most sensitive agent-facing path.

This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.

## Repo Evidence

- Repository description: A brutally honest "high‑orbit" startup advisor you can text or run from the CLI. Built with DSPy, it provides opinionated, YC-style advice and financial tools for founders.
- Tree signals: 0 docs files, 1 workflows, 0 proto files, 8 test-like files.
- `README.md:15` includes latent-spec language: - **🧠 Best-of-N + Rerank**: Generate multiple drafts and pick the best via a critic. - **🧪 Evals & Rubrics**: Personas, rubrics, overlap penalty, and CSV/MD summaries.
- `README.md:66` includes latent-spec language: - `models list [--provider openai|anthropic]`: List available model IDs. - `eval run --dataset <yaml> --out <jsonl>`: Run evals and save results. - `eval report <jsonl>`: Show overall summary.
- `README.md:67` includes latent-spec language: - `eval run --dataset <yaml> --out <jsonl>`: Run evals and save results. - `eval report <jsonl>`: Show overall summary. - `eval grade --dataset <yaml> --results-path <jsonl> --out <jsonl>`: Rubric grading.
- `README.md:68` includes latent-spec language: - `eval report <jsonl>`: Show overall summary. - `eval grade --dataset <yaml> --results-path <jsonl> --out <jsonl>`: Rubric grading. - `eval summary --input-path <jsonl> [--csv-out <csv>] [--md-out <md>]`: Export summaries.
- `README.md:69` includes latent-spec language: - `eval grade --dataset <yaml> --results-path <jsonl> --out <jsonl>`: Rubric grading. - `eval summary --input-path <jsonl> [--csv-out <csv>] [--md-out <md>]`: Export summaries.
- `README.md:140` includes latent-spec language: ## Evals & Self‑Grading

## Research Grounding

Repo axes: infra, governance, security, evaluation

Search keywords: jsonl, cli, run, evals, eval, str, orbit_agent, export, list, yaml, orbit, personas

- [arXiv:2604.04749v1](https://arxiv.org/abs/2604.04749v1) AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments (Eranga Bandara, Asanga Gunaratna, Ross Gore, Abdul Rahman, Ravi Mukkamala, Sachin Shetty), 2026.
- [arXiv:2604.26152v1](https://arxiv.org/abs/2604.26152v1) AI Observability for Large Language Model Systems: A Multi-Layer Analysis of Monitoring Approaches from Confidence Calibration to Infrastructure Tracing (Twinkll Sisodia), 2026.
- [arXiv:2604.17092v1](https://arxiv.org/abs/2604.17092v1) AI Observability for Developer Productivity Tools: Bridging Cost Awareness and Code Quality (Happy Bhati, Twinkll Sisodia), 2026.
- [arXiv:2604.03262v1](https://arxiv.org/abs/2604.03262v1) AI Governance Control Stack for Operational Stability: Achieving Hardened Governance in AI Systems (Horatio Morgan), 2026.
- [arXiv:2502.15859v4](https://arxiv.org/abs/2502.15859v4) AI Governance InternationaL Evaluation Index (AGILE Index) 2024 (Yi Zeng, Enmeng Lu, Xin Guan, Cunqing Huangfu, Zizhe Ruan, Ammar Younas), 2025.
- [arXiv:2503.15577v1](https://arxiv.org/abs/2503.15577v1) Navigating MLOps: Insights into Maturity, Lifecycle, Tools, and Careers (Jasper Stone, Raj Patel, Farbod Ghiasi, Sudip Mittal, Shahram Rahimi), 2025.
- [arXiv:2407.01557v1](https://arxiv.org/abs/2407.01557v1) AI Governance and Accountability: An Analysis of Anthropic's Claude (Aman Priyanshu, Yash Maurya, Zuofei Hong), 2024.
- [arXiv:2510.21203v1](https://arxiv.org/abs/2510.21203v1) The Nuclear Analogy in AI Governance Research (Sophia Hatz), 2025.
- [arXiv:2601.20415v1](https://arxiv.org/abs/2601.20415v1) An Empirical Evaluation of Modern MLOps Frameworks (Jon Marcos-Mercadé, Unai Lopez-Novoa, Mikel Egaña Aranguren), 2026.
- [arXiv:2604.24801v2](https://arxiv.org/abs/2604.24801v2) Architectural Observability Collapse in Transformers (Thomas Carmichael), 2026.

## What To Build

- Add adversarial fixtures for deployment drift, credentials, and privileged workflow inputs.
- Document the intended fail-closed behavior and any allowed degraded-mode fallback.
- Add regression coverage that proves unsafe inputs do not silently reach the privileged path.

## Acceptance Criteria

- [ ] A short design note names the repo-specific workflow, threat or correctness model, and the research assumptions being adopted.
- [ ] A runnable check, fixture, or verifier exercises the new contract in CI or an equivalent local command documented in the repo.
- [ ] The implementation emits or stores enough evidence for a downstream agent/operator to cite inputs, decisions, and outputs.
- [ ] At least one negative/degraded-mode case is covered so failures are observable rather than silently accepted.
- [ ] Documentation links the new behavior to the relevant EvalOps platform primitive or explicitly records why this repo remains standalone.

## Notes

- Generated issue 3/5 for `evalops/orbit-agent` by `evalops_org_miner.py`.
- Before implementation, confirm the sampled latent-spec snippets still match `main`; this issue intentionally cites exact file paths/lines where the mining pass saw them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adversarial safety fixtures for orbit agent (brutally honest high orbit startup) #29

Summary

Repo Evidence

Research Grounding

What To Build

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add adversarial safety fixtures for orbit agent (brutally honest high orbit startup) #29

Description

Summary

Repo Evidence

Research Grounding

What To Build

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions