Skip to content

Proposal: optional audit manifest for reproducible MLCube runs #367

@mindbomber

Description

@mindbomber

Proposal

Would MLCube be open to an optional run audit manifest for MLCube executions?

MLCube already focuses on portability and reproducibility. A small sidecar manifest could make benchmark runs easier to review, compare, cite, and publish safely without changing the core MLCube task interface.

Suggested manifest shape

{
  "schema_version": "mlcube.run_audit.v1",
  "mlcube_task": "run",
  "runner": "docker",
  "image": "...",
  "mlcube_config_hash": "...",
  "benchmark": "...",
  "dataset_refs": [
    {
      "source_id": "...",
      "kind": "dataset",
      "provenance": "...",
      "redaction_status": "safe_for_public_log"
    }
  ],
  "result_paths": ["..."],
  "provenance": {
    "repo": "...",
    "commit": "...",
    "created_at": "..."
  },
  "claim_status": "diagnostic",
  "redaction_status": "safe_for_public_log"
}

Why this may help

  • makes it clearer which MLCube/config/image produced a benchmark result
  • preserves provenance such as runner, image, config hash, repo commit, dataset refs, and result paths
  • separates diagnostic/internal runs from results intended for public reports or model cards
  • gives downstream benchmark users a standard place for audit-safe metadata
  • avoids storing raw secrets, private paths, tokens, or full sensitive arguments in public logs

Scope I would keep small

If maintainers think this is useful, I can prepare a follow-up PR that:

  • documents an optional manifest schema
  • adds a minimal example manifest under docs/examples or docs/getting-started
  • keeps the manifest opt-in and backward compatible
  • does not change existing runner behavior by default
  • does not add external service dependencies

This is motivated by work in the AANA project around audit-safe AI evaluation artifacts, but the contribution would be generic to MLCube and would not require AANA as a dependency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions