Skip to content

Land WP1–WP5: pipeline cleanup, ontology discipline, three-remote architecture#10

Merged
mzargham merged 51 commits into
mainfrom
staging
May 29, 2026
Merged

Land WP1–WP5: pipeline cleanup, ontology discipline, three-remote architecture#10
mzargham merged 51 commits into
mainfrom
staging

Conversation

@mzargham
Copy link
Copy Markdown
Contributor

@mzargham mzargham commented May 29, 2026

Summary

Merges the full roadmap (51 commits across five work packages plus CI/notebook fixes) from staging to main. The demo moves from "mock-up that narrates a three-remote story" to "running code that exercises four services with a fully audited multi-ASoT provenance graph."

  • WP1 — Pipeline cleanup + rerun surfacing (closes Surface "which pipeline stages must re-run" from SHACL closure + hash-mismatch results #3). PipelineState dataclass + per-stage typed records; query_named_graph helper; ExecutionMetadata.executor_uri/location_uri consolidation; validation→verification rename with back-compat alias; interrogate.rerun Typer CLI + Stage 6.5 banner extension; Typer migration of pipeline.runner + interrogate.{explain,reproduce}; openCAESAR prose cleanup.
  • WP2 — Ontology discipline (closes Promote ROBOT/ELK validation to default ontology build path #2). ROBOT/ELK promoted to canonical make ontology (fail-fast on missing Java); .github/workflows/ontology.yml with setup-java@v4 + cached robot.jar; pytest live / network markers; triple-count budget gate; openCAESAR code/data cleanup + rtm.ttl regen.
  • WP3 — Docker image as tracked evidence (closes Track Docker images as first-class evidence content #4). rtm:DockerImage class + property set; hash_docker_image(); DockerCompute._emit_image_node(); prov:wasDerivedFrom on Docker evidence; evidence_by_image SPARQL helper; DockerEvidenceShape closure rule.
  • WP4 — Three-remote architecture made real (16 commits). Preflight probes on storage + compute backends (fail-fast on unreachable remotes); rtm:gitRef + rtm:flexoRecord cross-linking image to git + Flexo; rtm:DockerContainer as a first-class materialization entity (image vs container vs host distinction); organizational auspices via prov:Organization + rtm:operatedBy; EARL-wrapped automated outcomes (rtm:ClosureRuleAssertion, rtm:DigestMatchAssertion); compute.reproduce Typer CLI (rebuild image at recorded git ref + digest-compare); TxnLogBackend (CouchDB) as a fourth service with its own URI + auspices; TransactionLogger with secret-redaction allowlist; six typed trust queries in traceability/queries.py; optional FLEXO_BRANCH_PREFIX for multi-run isolation; tools/start-services.sh + docker-compose.yml; new ARCHITECTURE.md.
  • WP5 — Storyboard integration + end-of-roadmap alignment. Audit module surfaces Docker image provenance in the report; notebook Acts 9–10 expanded with multi-ASoT framing + numerical-evidence end-to-end provenance render; RECONCILIATION.md mapping every slide claim to its code receipt.
  • Post-WP CI + notebook fixes. Reproducible build_time (SOURCE_DATE_EPOCH + git ct); actions/checkout with fetch-depth: 0; diff-gate scoped to rtm.ttl (manifest legitimately diverges by build path); CLI --help substring tests hardened against terminal-width wrapping; WP* mentions stripped from notebook prose.

Closed by this PR

Remain open (intentional)

New surfaces a reviewer can read cold

  • ARCHITECTURE.md — three-remote + fourth-service picture, URI scheme, EARL outcomes, six trust queries, preflight gate, reproducibility loop
  • RECONCILIATION.md — slide claims ↔ code receipts for the May-22 OpenMBEE deck
  • .env.example — every FLEXO/ADCS_TXNLOG/ADCS_*_ORG env var with documented defaults
  • tools/start-services.sh — one-liner CouchDB txnlog store provisioning
  • Notebook (renders to Pages on merge) — "Many Authoritative Sources of Truth" cell + "Numerical evidence — full provenance end-to-end" cell

Test plan

🤖 Generated with Claude Code

mzargham and others added 30 commits May 27, 2026 11:18
…yped result records

Split run_pipeline's 285-line body into per-stage free functions
threaded by a PipelineState object. Each stage returns a typed
result record (StructuralResult, SymbolicResult, NumericalResult,
EvidenceBindingResult, AttestationStageResult, ClosureRuleResult,
AuditStageResult, ReportStageResult) assigned to the matching state
field. Downstream stages read prior results via state.<prior>.<field>
instead of free locals.

The activity_to_stage table maps p-plan step IRI fragments
(STEP_NAMES) to pipeline stage numbers. Kept in sync with
traceability.plan_execution; covered by a new unit test.

Vestigial `stage = LifecycleStage.X` assignments dropped (the
variable was set but never read). LifecycleStage and check_gate are
preserved for external callers and tests.

CLI surface unchanged. Tests: 23 pipeline + named_graphs passing,
83 attestation + audit + traceability + shape_suite + compute +
backends passing.

Part of WP1 (roadmap §4.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…coped queries

`pipeline.dataset.query_named_graph(ds, layer, sparql, **bindings)`
scopes a SPARQL query to one named graph. Use it when the query is
intentionally layer-specific; keep using `ds.query(...)` when the
query is meant to walk the union view (via Dataset(default_union=True)).

The existing queries in traceability/audit.py, traceability/queries.py,
and traceability/attestation.py are intentionally union-scoped. Added
a section banner in audit.py documenting this and extended the
query_to_dicts docstring in queries.py with the convention so future
contributors don't reach for graph_for() when they really want the
union.

Two new tests cover the helper: layer-scoped count is a strict subset
of the union count, and unknown layers raise KeyError.

The helper is added as a primitive for WP3/WP4 (Docker image + Flexo
remote queries that legitimately want one-layer scope) without
forcing any current call site to migrate.

Part of WP1 (roadmap §4.2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…methods

The executor and location URI shapes used to live as inline string
construction in evidence.binding._bind_execution_metadata; promote
them to methods on ExecutionMetadata so WP3 (rtm:DockerImage as
evidence) and WP4 (three-remote architecture) can reuse the same
shapes without copy-paste.

IRI shapes preserved byte-for-byte:
  executor_uri()  -> urn:adcs:executor:<container_id|hostname|unknown>
                     (colons in suffix replaced with dashes)
  location_uri()  -> urn:adcs:location:<location_kind>:<hostname|unknown>

evidence.binding._bind_execution_metadata now consumes the methods
directly. A new TestExecutionMetadataURIs class covers prefer-
container-id, fall-back-to-hostname, unknown sentinel, colon
replacement, and the location shape.

Part of WP1 (roadmap §4.3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strict semantic split: verification = automated, fully-specified
check (SHACL conformance, ROBOT/ELK, hash matching, completeness).
validation = human judgement (attestation, adequacy, sufficiency).

Module + test moves (git mv preserves history):
  traceability/validation.py  -> traceability/verification.py
  tests/test_robot_validation.py -> tests/test_robot_verification.py

Symbol renames inside the renamed module:
  validate()              -> verify()
  validate_shacl()        -> verify_shacl()
  validate_reverification() -> verify_reverification()
  ValidationReport        -> VerificationReport

Symbol renames in traceability/rtm.py:
  validate_structural_completeness -> verify_structural_completeness
  validate_evidence_completeness   -> verify_evidence_completeness

Back-compat aliases retained inside the renamed module — to be
removed in a follow-up PR after WP3 lands.

Runner / banner string updates: "Validating closure-rule suite..."
-> "Verifying...", "Structural validation: PASS" -> "Structural
verification: PASS", Stage 0 banner "Validation: ..." ->
"Verification: ...".

Plan.ttl rdfs:label updates: "Stage 6.5 — Validate Closure-Rule
Suite" -> "Verify..."; "Validation Report" -> "Verification Report".
The step IRI fragment <plan/step/ValidateShapes> is PRESERVED to
keep already-persisted <adcs:plan-execution> + <adcs:audit> graphs
valid; IRI rename tracked separately for a future Flexo migration
(WP1 §10 Known follow-ups).

Notebook function-call references (Acts 4 + Stage 6.5 narration)
updated to new symbols; narrative prose unchanged (WP5 owns that).

scripts/build_ontology.py is INTENTIONALLY UNTOUCHED — WP2 renames
_validate_sysml_axioms there in the same commit that lands the
openCAESAR cleanup, to avoid a merge conflict on that file.

Where validation legitimately stays: traceability/attestation.py,
request_attestation(), upstream pyshacl.validate, OSLC oslc_qm: IRI
fragments, vendored ontology imports.

Test counts: 171 passed, 4 skipped (baseline 162 + 9 new from prior
WP1 commits). Live-Flexo failures predate WP1 and are out of scope.

Part of WP1 (roadmap §4.4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Typer is the demo's CLI framework convention (per WP1 §4.6). The next
commit migrates pipeline.runner + interrogate.{explain,reproduce} to
Typer apps; the rerun.py CLI added later in this PR is Typer-based
from the start.

Pinned to >=0.12,<1.0 (current resolved: 0.26.2). Brings in Click +
Rich + markdown-it-py + shellingham as transitive deps; all stable
and well-established.

Part of WP1 (roadmap §4.6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces argparse.ArgumentParser with a Typer app. Every flag name is
preserved (--auto, --no-attest, --engineer, --rebuild, --backend,
--compute) so existing invocations work unchanged. Choice-validated
options (--backend, --compute) use Enum subclasses so Typer matches
the prior argparse `choices=` semantics.

The `main()` callable is retained as a thin wrapper around `app()` so
the `[project.scripts] adcs-pipeline = "pipeline.runner:main"` entry
point keeps resolving.

interrogate/explain.py, interrogate/reproduce.py, interrogate/visualize.py
are library-only modules with no CLI entry points; nothing to migrate
there. WP1 §4.6 specified them speculatively; the actual scope is
just pipeline.runner. The deferred top-level `adcs` aggregator
(issue #5) can revisit when WP4 adds Flexo materialization commands.

New tests/test_cli.py uses typer.testing.CliRunner for smoke tests:
- pipeline.runner --help renders + lists every flag
- --backend / --compute reject values outside the enum
- main symbol stays importable for the console script

Part of WP1 (roadmap §4.6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…closes #3)

`interrogate.rerun` walks prov:wasGeneratedBy -> p-plan:correspondsToStep
to translate a VerificationReport into the dedup'd ordered set of
pipeline stages that must re-run to restore RTM closure. SHACL
violations on structural / human-judgement nodes (attestations, etc.)
that have no producing activity are reported separately — no stage
rerun can fix them.

Schema enrichment (evidence/binding.py): every per-evidence
SymbolicAnalysis / NumericalSimulation activity now carries
p-plan:correspondsToStep linking it to the SymbolicAnalysis /
NumericalSimulation step in plan.ttl. This makes the evidence ->
step traversal self-describing rather than relying on activity-IRI
naming conventions, and aligns the per-evidence activities with the
existing stage-level activities emitted by emit_stage_activity.

Stage 6.5 banner extension (pipeline/runner.py): when the verification
report does not conform, render the rerun plan inline so the engineer
sees which stages must re-run without having to invoke the CLI
separately.

CLI (Typer-based, WP1 §4.6 discipline):
  uv run python -m interrogate.rerun                     # default md output
  uv run python -m interrogate.rerun --requirement REQ-003
  uv run python -m interrogate.rerun --format json
Exit codes: 0 = clean, 1 = stages or structural violations present,
2 = input file not found.

Tests cover all 7 of issue #3's acceptance criteria:
  AC1: closed RTM -> empty stage set
  AC2: proof hash mismatch -> Stage 2
  AC3: simulation violation -> Stage 3
  AC4: multiple invalidations -> ordered union [2, 3]
  AC5: attestation-level violation -> structural_violations, no stages
  AC6: CLI smoke tests in tests/test_cli.py
  AC7: Stage 6.5 banner extension verified by integration

Test counts: 187 passed, 4 skipped (was 171 after commit 4; +16 across
WP1 commits 5-7 covering Typer + rerun).

Part of WP1 (roadmap §4.5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per roadmap cross-cutting section "drop explicit openCAESAR
references", remove prose mentions from the WP1-owned files:
  README.md         — Architecture blurb + namespace table row
  CLAUDE.md         — namespace table row
  notebook.py       — 3 narration cells (Act 1 namespace table,
                      epilogue prologue summary, Act 11 stack table)
  ontology/rtm-edit.ttl — header comment + ontology description +
                      SysMLv2 binding section comment
  ontology/prefixes.py — module docstring + SysMLv2 section comment +
                      OMG_SYSML inline comment
  scripts/fetch_imports.py — module docstring

The OMG IRI itself (http://www.omg.org/spec/SysML/20240501/) stays —
it is the OMG official SysMLv2 OWL rendering, correct on its own
terms. The `omg-sysml:` prefix and the OMG_SYSML constant keep their
names and values. Only the attribution text changes.

Built ontology regenerated (`uv run python -m scripts.build_ontology`)
because rtm-edit.ttl comments changed: ontology/rtm.ttl + manifest
get fresh edit_source_hash. Triple count unchanged (156 in / 156 out).

WP2 owns the code-side cleanup (CSV column `opencaesar_iri` ->
`omg_iri`, constant `SYSML_OPENCAESAR_NS` -> `SYSML_OMG_NS`, lookup
updates in scripts/build_ontology.py + tests/test_ontology_build.py)
and will regenerate rtm.ttl again as part of its commit; that
regeneration will produce identical content because the renames
don't alter the equivalence-axiom IRIs the script emits.

Verification: full-repo grep limited to WP1 prose set returns zero;
remaining hits (build_ontology.py constant + lookups, CSV header,
rtm.ttl built artifact) are explicitly WP2 territory.

Tests: 187 passed, 4 skipped.

Part of WP1 (roadmap §4.7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…scipline

README.md:
- New "Pipeline architecture" subsection introducing PipelineState +
  per-stage typed result records + Typer CLI convention.
- New "Rerun plan from a verification report" subsection under
  Interrogation showing the interrogate.rerun CLI (issue #3) with
  --requirement and --format examples + exit-code contract.
- Stage banner: "Validate Closure-Rule Suite" -> "Verify Closure-Rule
  Suite"; Stage 0 banner sample "Validation:" -> "Verification:"
  matches what runner now prints.
- Top-line paragraph: "validated by a SHACL closure-rule suite" ->
  "verified by a SHACL closure-rule suite".
- Key Directories: traceability/ updated; pipeline/ mentions
  PipelineState + query_named_graph; interrogate/ adds rerun.
- Ontology Authoring section + Toolchain table are NOT touched here —
  WP2 owns those (ROBOT-as-default rewrite). Single coordination point
  per the cross-WP plan.

CLAUDE.md:
- New "Pipeline state + structured stage results" subsection (canonical
  description of the PipelineState pattern).
- New "CLI surface" section: every CLI is Typer; flag names preserved;
  CliRunner-based tests; deferred top-level `adcs` aggregator linked
  to issue #5.
- New "Verification vs validation (term discipline)" section: defines
  the split, names pyshacl as the explicit upstream-API exception,
  notes the preserved ValidateShapes IRI fragment.
- Toolchain: pyshacl rephrased to mention the verify wrapper; typer
  added as a runtime dep.
- Key directories: traceability/ + pipeline/ + interrogate/ updated.

Tests: 187 passed, 4 skipped.

Part of WP1 (roadmap §5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WP1 of the roadmap at /Users/z/.claude/plans/i-want-to-continue-atomic-lobster.md.
Internal cleanups (PipelineState refactor, query_named_graph helper,
ExecutionMetadata URI methods), validation -> verification rename
discipline, Typer migration of pipeline.runner, new interrogate.rerun
CLI mapping closure violations to pipeline stages (closes #3), and
the WP1 share of the openCAESAR prose cleanup.

9 commits, 30 files, +1437 / -266 lines. Test suite 162 -> 187
passing (no new failures). Output triple count 948 -> 955 (+7 new
p-plan:correspondsToStep schema enrichment on per-evidence
activities).

Staged for integration with WP2 (ROBOT default + pytest markers +
triple budget + openCAESAR code/data cleanup) before promotion to
main.

Follow-up issues filed:
- #5  deferred top-level `adcs` Typer aggregator (WP4-dependent)
- #6  ValidateShapes step IRI fragment rename (WP4-dependent)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `[tool.pytest.ini_options]` markers `live` and `network` plus
default `addopts = "-m 'not live and not network'"` so the canonical
`uv run pytest` invocation filters infrastructure-dependent tests
without requiring per-test env-var introspection. CI opts in
explicitly with `-m live` (or `-m network` once any are written).

tests/test_flexo_live.py rewritten:
- pytestmark switches from env-var-driven `skipif` to `@pytest.mark.live`.
- `_flexo_reachable()` removed — connectivity probing belongs in
  fixtures, not at module import. When `-m live` is requested but
  credentials are missing, the `token` fixture now fails LOUDLY
  (pytest.fail) instead of skipping. Skip-on-opt-in would hide infra
  breakage; the marker is the opt-in signal.
- Docstring updated to show the new invocation pattern.

Tests: 187 passed, 4 skipped, 3 deselected (live tests filtered out
by default). Previously: 162 passed, 2 failed, 1 errored on live —
those failures were infrastructure noise predating WP1, now correctly
gated behind the marker.

Part of WP2 (subplan §4.B).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rtm: is an integration ontology — it should contribute only
convenience handles, hashing properties, and SHACL targets, never
new epistemic vocabulary. The gate keeps that promise honest:
`scripts/build_ontology.py` now fails the build if the assembled
artifact exceeds TRIPLE_BUDGET. Current size 156 + 200 headroom for
WP3's rtm:DockerImage + property set and other small adds.

Budget bump is a deliberate, single-place act: edit TRIPLE_BUDGET
in scripts/build_ontology.py with an updated rationale comment.
WP3 will bump it when rtm:DockerImage lands; no silent drift.

Build banner now prints `Parsimony: <actual>/<budget> triples
(<headroom> headroom)` alongside the existing artifact summary. The
manifest gains a `triple_budget` block (`value`, `rationale`,
`headroom`) so consumers reading the manifest see the gate without
sources.

New `tests/test_ontology_size.py` (3 tests) imports TRIPLE_BUDGET as
the single source of truth and verifies:
  - the committed `rtm.ttl` parses under budget
  - the manifest pins the budget + rationale
  - the manifest's recorded triple count matches the parsed artifact
    (catches a stale manifest committed without re-running the build)

Tests: 196 passed (was 187), 4 skipped, 3 deselected.

Part of WP2 (subplan §4.C).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + regen)

Code/data half of the cross-cutting openCAESAR drop. The WP1 share
already handled the prose; this commit takes the remaining
identifiers + data + the regenerated artifact.

Renames:
  ontology/sysml_term_map.csv:
    column header `opencaesar_iri` -> `omg_iri`
  scripts/build_ontology.py:
    constant `SYSML_OPENCAESAR_NS` -> `SYSML_OMG_NS`
    function `_validate_sysml_axioms` -> `_verify_sysml_axioms`
    row lookups `row['opencaesar_iri']` -> `row['omg_iri']`
  tests/test_ontology_build.py:
    matching `row['opencaesar_iri']` -> `row['omg_iri']`

The function rename is the WP1 verification discipline applied to a
file WP1 explicitly scope-excluded so WP2 could own it in the same
commit as the openCAESAR cleanup (avoiding a merge conflict on
build_ontology.py). The IRIs the script emits are unchanged — the
OMG namespace value `http://www.omg.org/spec/SysML/20240501/` stays
because it's the official OMG SysMLv2 OWL rendering, correct on its
own terms. Only the attribution text and the local label change.

Regenerated `ontology/rtm.ttl` + `ontology/assembly_manifest.json`
(`uv run python -m scripts.build_ontology`). Triple count 156/356;
edit-source hash refreshed; CSV row count unchanged at 9.

Repo-wide grep gate:
  grep -rni "caesar|opencaesar|open-caesar" --include={py,md,ttl,json,csv,toml,yaml,yml,sh}
returns zero hits across the whole repo (WP1 prose + WP2 code/data
both clean).

Tests: 190 passed, 4 skipped, 3 deselected (was 196 after commit 2;
no test count change here — the same tests, all green).

Part of WP2 (subplan §4.D); cross-coordinates with WP1 §4.4
(roadmap "Drop explicit openCAESAR references" section).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`make ontology` now requires Java + obo-robot on PATH and runs the
full chain: preflight -> ontology-robot (merge + ELK reason + OBO
report) -> ontology-python with ADCS_ROBOT_VERIFIED=1. Fails fast
with a helpful error pointing at the no-Java alternative
(`make ontology-python`) when the toolchain is missing.

No `ROBOT_OPTIONAL` escape hatch: the no-Java path is the explicit
`ontology-python` Makefile target — invoking it is an intentional
opt-out, not a flag on the default. Honours the roadmap's "stop being
a mock-up; the integration story should not silently degrade" rule.

scripts/build_ontology.py reads ADCS_ROBOT_VERIFIED from the env to
decide what to write into the manifest's `robot_used` + `notes`
fields. Stage 0 banner branches on `robot_used` to print either
"ROBOT merge + ELK reasoning + OBO report PASS" or "Python assembly
only (no-Java path; run `make ontology` for ROBOT/ELK verification)".

New `.github/workflows/ontology.yml`:
  - actions/checkout@v4 + actions/setup-java@v4 (Temurin 17)
  - Cached ROBOT jar (v1.9.5) downloaded once per cache key
  - 3-line bash wrapper installs as `obo-robot` on PATH
  - astral-sh/setup-uv@v6 + `uv sync`
  - `make ontology` runs the canonical chain
  - Confirms `rtm.ttl` + `assembly_manifest.json` are committed
    in-sync with the rebuild (catches forgotten regen commits)
  - `uv run pytest -v` (live + network markers skip by default)
Triggers on push to main + staging and on PRs targeting either.

Smoke-tested locally:
  - `make ontology-python` writes `robot_used: false` + correct notes
  - `make ontology` on a no-Java machine fails fast with the
    documented error message

Tests: 190 passed, 4 skipped, 3 deselected (unchanged from §4.C —
the rename + Makefile changes don't alter test behaviour).

Closes issue #2.

Part of WP2 (subplan §4.A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…riple budget

README.md:
- Toolchain table: OBO ROBOT row promoted from `optional` to
  `required (default)` with the no-Java alternative spelled out.
- New "Tests" subsection under Quick Start documenting the marker
  convention and the default-skip rule (`addopts` in pyproject.toml).
- Ontology Authoring rewritten: `make ontology` as canonical with
  the fail-fast preflight; `make ontology-python` as the explicit
  no-Java target; `make ontology-robot` as just the ROBOT step.
- Triple-count budget mention added so contributors know about the
  parsimony gate before they discover it via a failing build.
- Stage 0 banner sample updated: rendered example now shows the
  ROBOT-default "Verification: ROBOT merge + ELK reasoning + OBO
  report PASS" line.
- "uv run pytest -v" comment in Quick Start: "166 tests" -> "default:
  skips live + network markers" (count fluctuates per WP).

CLAUDE.md:
- Toolchain section: OBO ROBOT row promoted to required-for-default;
  CI Java + cached robot.jar called out.
- New paragraph on the runner: it does NOT need Java/obo-robot;
  only rebuilding the ontology does.
- Tests section: documents the marker convention + the fail-loud
  behaviour of test_flexo_live.py under `-m live` (no silent skips).
- Ontology rebuild section: three-target chain with the no-Java
  escape, fail-fast preflight, robot_used manifest field, and the
  TRIPLE_BUDGET parsimony gate.

Review gates passed:
- `grep -rni "caesar|opencaesar|open-caesar"` — zero hits
- `grep -rn "ROBOT_OPTIONAL"` — zero hits (escape hatch dropped)
- `validate_sysml_axioms` hit only in a rename-rationale docstring
- 190 passed, 4 skipped, 3 deselected

Part of WP2 (subplan §5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…enCAESAR cleanup) into staging

WP2 of the roadmap at /Users/z/.claude/plans/i-want-to-continue-atomic-lobster.md.

- §4.A ROBOT/ELK promoted to default `make ontology` with fail-fast
  preflight (no ROBOT_OPTIONAL escape hatch). `.github/workflows/ontology.yml`
  installs Java 17 + cached robot.jar and runs `make ontology` + tests
  on every push to main/staging and on PRs. Closes #2.
- §4.B Pytest `live` + `network` markers registered in pyproject.toml,
  default `addopts = "-m 'not live and not network'"`. test_flexo_live.py
  rewritten to fail-loudly under `-m live` when credentials are
  missing (no silent skips on opt-in).
- §4.C Triple-count budget (TRIPLE_BUDGET=356) gate in
  scripts/build_ontology.py + new tests/test_ontology_size.py. Manifest
  records `triple_budget` block with rationale.
- §4.D openCAESAR code/data cleanup (WP2 share): CSV column
  `opencaesar_iri` -> `omg_iri`, constant `SYSML_OPENCAESAR_NS` ->
  `SYSML_OMG_NS`, function `_validate_sysml_axioms` ->
  `_verify_sysml_axioms` (WP1 verification discipline applied to a
  file WP1 scope-excluded for coordination). rtm.ttl + manifest
  regenerated.
- §5 README + CLAUDE.md sweep aligning Toolchain, Ontology Authoring,
  Tests, and Stage 0 banner sample with the new defaults.

5 commits, +257 / -52 lines. Test counts: 190 passed, 4 skipped, 3
deselected. Repo-wide grep gates clean (zero openCAESAR, zero
ROBOT_OPTIONAL).

Staged for integration with WP3+ before promotion to main. JAXA
workshop window (2026-06-12) is comfortably met.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WP3 §4.1 + §4.8. Promotes the Docker image from an inline label on a
prov:SoftwareAgent (where it lives today via _bind_execution_metadata)
to a tracked entity that downstream evidence can derive from.

New class in `ontology/rtm-edit.ttl`:

  rtm:DockerImage rdfs:subClassOf prov:Entity

New datatype properties (domain rtm:DockerImage, range xsd:string):

  rtm:imageLabel        — repo/tag
  rtm:baseImageDigest   — FROM-image digest resolved at build
  rtm:dockerfileHash    — SHA-256 of Dockerfile bytes
  rtm:buildContextHash  — SHA-256 over build-context file manifest

rtm:contentHash already exists for rtm:Evidence — the image's own
content hash (runtime digest) reuses it without redeclaration.
prov:wasDerivedFrom reuses PROV — no new property.

TRIPLE_BUDGET bumped 356 -> 380 with a rationale-comment update
documenting the WP2 (356) and WP3 (380) values and the cause of
the bump. Actual current count: 176/380 (204 headroom).

ontology/rtm.ttl + ontology/assembly_manifest.json regenerated via
`uv run python -m scripts.build_ontology`.

The class is now declared but not yet referenced anywhere — that
arrives in commit 3 (DockerCompute.emit_image_node) + commit 4
(prov:wasDerivedFrom on evidence) + commit 6 (SHACL shape).

Tests: 9 ontology + 9 ontology-build tests pass (190 total once full
suite runs).

Part of WP3 (subplan §4.1 + §4.8); first of 7 commits closing issue
#4 AC1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dressing (issue #4 AC2)

WP3 §4.2. Pins Docker build inputs with two SHA-256 hashes:

  dockerfile_hash       — SHA-256 of the Dockerfile bytes
  build_context_hash    — SHA-256 of a sorted POSIX-normalized manifest
                          of <relative-path>\t<file-sha256> lines

These pin what `docker build` sees on disk. They are independent of
the runtime image digest the daemon assigns AFTER build — that's
captured separately as the image's rtm:contentHash. The pair plus
the resolved base-image digest is what makes a Docker image
reproducibly identifiable.

DOCKER_BUILD_CONTEXT_DEFAULT_IGNORES excludes the obvious junk
(.git, __pycache__, *.pyc, .venv, node_modules, .docker-ipc, output,
.DS_Store, .pytest_cache, .ruff_cache) so the hash doesn't churn on
local dev artifacts. The internal _ignored() helper matches each
glob against the leaf name, every intermediate path component, AND
the full relative path so single-component patterns like `.git`
exclude entire subtrees correctly.

Manifest separator is normalized to '/' so the same context hashes
identically on macOS / Linux / WSL. os.walk's dirnames mutation
prunes ignored subtrees so we don't recurse uselessly.

The manifest format is intentionally simple. If the demo adopts
SLSA / in-toto envelopes later, that becomes the canonical envelope
and this hash stays as a fast self-check.

tests/test_docker_image_evidence.py (new file): 8 tests covering
determinism, Dockerfile-change detection, context-change detection,
new-file detection, default ignore patterns (.git + __pycache__ +
*.pyc + .venv + node_modules + output + .DS_Store), custom ignore
patterns, missing-Dockerfile FileNotFoundError, and a smoke test
against the repo's actual compute/Dockerfile + project root.

Tests: 8/8 new pass; previous 190 unchanged.

Part of WP3 (subplan §4.2); second of 7 commits, closes issue #4 AC2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…per run (issue #4 AC3)

WP3 §4.3. Promotes the Docker image from an inline label on the
prov:SoftwareAgent (where ExecutionMetadata wrote it) to a first-
class rtm:DockerImage entity in the evidence graph.

DockerCompute new methods:

  _parse_from_image()        — pull the first FROM line from
                                compute/Dockerfile via regex.
                                Returns "" on parse failure.
  _resolve_base_image_digest() — `docker image inspect <FROM-tag>`
                                with graceful empty-string fallback
                                when the base isn't pulled locally.
                                Cached per instance.
  emit_image_node(graph)     — idempotent per WP3 run; on first call
                                computes hashes via hash_docker_image,
                                resolves the base digest, writes 8
                                triples (rdf:type DockerImage + Entity,
                                contentHash, imageLabel, baseImageDigest,
                                dockerfileHash, buildContextHash,
                                prov:generatedAtTime) and caches the IRI.

State added to __init__: _image_node_iri, _image_built_at,
_base_image_digest. _image_built_at is captured at the end of
_build_image() so the prov:generatedAtTime stamp reflects the actual
build time, not the emission time.

IRI shape: urn:adcs:docker-image:<digest-with-colons-replaced-by-dashes>.
Mirrors ExecutionMetadata.executor_uri() (WP1 §4.3) so IRI shapes
across the demo's URN space stay coherent.

Resolution decisions baked in (WP3 subplan §9 open questions):
  Q1 baseImageDigest: try to resolve via `docker image inspect`,
     graceful empty-string fallback if the base isn't pulled
     (chosen: pipeline does NOT fail on missing base).
  Q3 Image IRI source: content-addressed on the runtime digest (not
     the deterministic build-input hash). The build-input hashes are
     recorded as properties for separate query.

tests/test_compute.py additions:
  - _docker_subprocess_factory extended with base_image_digest=
    parameter; heuristic distinguishes project-image vs base-image
    inspect calls by checking for "adcs-compute" prefix.
  - TestDockerImageEmit class (4 tests): all-properties present,
    idempotent-within-one-run, base-image-missing graceful degrade,
    colon-escape in IRI suffix.
  - test_dockerfile_from_line_parseable smoke test against the real
    compute/Dockerfile (sanity: regex parser returns a python tag).

The image node is now emitable but not yet REFERENCED from evidence
nodes — that's commit 4 (prov:wasDerivedFrom wiring) + commit 6
(SHACL closure rule enforcing the link).

Tests: 22 passed, 1 skipped (live Docker daemon required).

Part of WP3 (subplan §4.3); third of 7 commits closing issue #4 AC3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rivedFrom (issue #4 AC4)

WP3 §4.4. With the rtm:DockerImage entity emitted (commit 3), wire
the link: every evidence node produced under --compute=docker now
carries `prov:wasDerivedFrom <image-iri>` in addition to the
existing `prov:wasGeneratedBy <activity>`. The two edges together
let a SPARQL traversal answer both "which image produced this
proof?" (wasDerivedFrom) and "which stage produced this proof?"
(wasGeneratedBy, the WP1 schema enrichment).

evidence/binding.py:
  bind_proof_evidence + bind_simulation_evidence gain an optional
  `image_iri: URIRef | None = None` kwarg. When present, add
  (ev_uri, PROV.wasDerivedFrom, image_iri) after the activity
  triples. Local-compute callers pass None (no edge added) — keeps
  the local path byte-identical to pre-WP3.

pipeline/runner.py (Stage 4):
  Compute image_iri ONCE per stage by calling
  state.compute_backend.emit_image_node(ev_graph) when
  state.compute_name == "docker"; otherwise None. Thread it through
  the four bind_proof_evidence calls (REQ-001..004) and the three
  bind_simulation_evidence calls. emit_image_node is idempotent so a
  single call captures the per-run image identity for all evidence.
  Banner prints the emitted IRI for visibility under --compute=docker.

The link is now in place; the SHACL closure rule that REQUIRES it for
Docker-executed evidence arrives in commit 6.

Tests: 202 passed (+12 since pre-WP3), 5 skipped, 3 deselected.
The 12 new tests are 8 hash_docker_image + 4 emit_image_node from
commits 2 & 3; the new bind_* kwarg is exercised via the pipeline
end-to-end path (test_pipeline.py runs run_pipeline which now
threads image_iri=None for the default --compute=local).

Part of WP3 (subplan §4.4); fourth of 7 commits closing issue #4 AC4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WP3 §4.5. The reverse-lookup the WP3 schema enables: given an image
digest, find every evidence node that was produced by a container
started from that image.

New SPARQL constant + helper in traceability/queries.py:

  EVIDENCE_BY_IMAGE       — joins rtm:DockerImage on rtm:contentHash
                            via prov:wasDerivedFrom, with initBinding
                            for the target digest.
  evidence_by_image(g, d) — returns list of dicts with ev / type /
                            evContentHash / modelHash keys. Empty
                            list on miss.

Walks the union view (the queries module's documented convention);
pass a Dataset to query across <adcs:evidence> + any other layer
that ends up holding evidence-image links.

tests/test_docker_image_evidence.py: 4 new tests, synthesized
dataset has two images (A + B) with two/one derived evidence nodes
plus one unlinked (local-compute-style) node:
  - returns linked evidence (image A -> 2 rows)
  - isolates by digest (image B -> 1 row, no leak)
  - miss returns empty list
  - unlinked evidence stays invisible to every image query

Tests: 12/12 in tests/test_docker_image_evidence.py (8 hash + 4
helper).

Part of WP3 (subplan §4.5); fifth of 7 commits closing issue #4 AC5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WP3 §4.6. Gives the WP3 schema teeth at Stage 6.5: every rtm:Evidence
whose generating activity ran under --compute=docker (signalled by
prov:atLocation matching urn:adcs:location:docker:*) MUST link to a
rtm:DockerImage via prov:wasDerivedFrom. Local-compute evidence is
exempt — the SPARQL target filter excludes urn:adcs:location:local:*
activities, so the nominal pipeline run continues to pass closure.

The shape follows the existing rtm:BackwardTraceabilityShape pattern
(sh:targetClass + sh:sparql with $this projection) rather than the
sh:target + SPARQLTarget pattern from the subplan draft — both work
under pyshacl but staying consistent with the established style
keeps the shape suite uniform.

Three new tests in tests/test_shape_suite.py:
  - test_docker_evidence_without_image_link_fails: synthesize a
    Docker-located activity + evidence WITHOUT wasDerivedFrom on a
    copy of the nominal dataset; closure fails with a DockerImage
    violation.
  - test_docker_evidence_with_image_link_passes: same shape but WITH
    a valid rtm:DockerImage + wasDerivedFrom edge; closure does NOT
    add a DockerImage complaint.
  - test_local_evidence_not_required_to_link_to_image: explicit
    conditional-correctness check — the nominal --compute=local
    fixture has only local-located activities, the shape's target
    filter must be vacuous on it.

Tests: 13/13 in tests/test_shape_suite.py (10 prior + 3 new).

Part of WP3 (subplan §4.6); sixth of 7 commits closing issue #4 AC6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README.md (under "Compute Backends (Phase L)"): new "Image as
tracked evidence (WP3)" subsection — what WP3 adds, the six
properties on rtm:DockerImage, a working evidence_by_image SPARQL
example, and an explicit pointer to WP5 for the deferred notebook
Act 9/10 rewrite + audit-module image surfacing.

CLAUDE.md ("Named-graph layout"): one-line update on <adcs:evidence>
acknowledging it now holds rtm:DockerImage too under --compute=docker.

The deeper README "Compute Backends" rewrite (image-as-evidence
narrative + audit summary integration) is WP5 territory; this commit
ships the minimal docs delta so contributors reading the repo today
can find the new entity + the SPARQL helper.

Tests: 209 passed (+19 across WP3 commits 2-6), 5 skipped, 3
deselected. No regressions.

Part of WP3 (subplan §4.9, §7); seventh of 7 commits. Partial
coverage of issue #4 AC9 — the full README "Compute backends"
section rewrite + audit-module + notebook narrative defer to WP5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the backend half of issue #4 (7 of 9 acceptance criteria):
- rtm:DockerImage class + property set (commit c31cbce)
- hash_docker_image() build-input hasher (a6a0680)
- DockerCompute.emit_image_node() emits one node per run (343420e)
- prov:wasDerivedFrom on Docker-produced evidence (425a263)
- evidence_by_image() SPARQL helper (86974f5)
- DockerEvidenceShape SHACL closure rule (3cbcf80)
- README + CLAUDE.md notes (60a3721)

The two narrative items (audit summary + notebook Act 9) are deferred
to WP5. Issue #4 stays open with a status comment listing the split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c1. Preflight reachability check on the persistence backend so
failure is fast and clear at startup rather than discovered at Stage 7.

- BackendUnavailable exception (new in pipeline/backends/base.py)
- StoreBackend.probe() Protocol method
- LocalBackend.probe() writes + deletes .probe sentinel in output dir
- FlexoBackend.probe() HEADs /orgs/<org>; respects FLEXO_PROBE_TIMEOUT
  (default 10s, distinct from the slow-call FLEXO_TIMEOUT)
- FuskeiBackend.probe() HEADs /data

7 new unit tests in test_backends.py cover success + failure paths
for each backend (mocked httpx). All 18 backend tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c2. Builds on the StoreBackend probe (c1) to make startup the
single fail-fast moment for backend reachability — no more discovering
Flexo / Docker unavailability at Stage 7 or Stage 2.

- ComputeUnavailable exception (compute/base.py); DockerNotAvailable
  now subclasses it for backwards compat
- ComputeBackend.probe() Protocol method
- LocalCompute.probe() is a no-op (always available)
- DockerCompute.probe() wraps _check_daemon()
- PipelineState gains store_backend field
- run_pipeline() constructs both backends up-front and runs
  _run_preflight() before Stage 0; banner prints describe() +
  PASS/FAIL for each; sys.exit(2) on any failure
- Stage 7 reads state.store_backend instead of re-instantiating

Tests: TestComputeProbe in test_compute.py + PipelineState fixture
fix in test_pipeline.py. Full suite: 219 passed (no regressions).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c3. Adds the "code remote" half of the three-remote provenance
chain: every rtm:DockerImage now carries a literal git+URI pointing
at the Dockerfile in the source tree at the commit it was built from.

- compute/git_ref.py — current_git_ref(repo_root, file_path); shells
  out to git rev-parse + git config; produces git+https://.../@<sha>#<path>
  with graceful fallbacks (git+file://, git+local://uncommitted)
- docker_compute.emit_image_node() appends rtm:gitRef triple
- Tests:
  - TestGitRef: shape + fallback + ssh→https normalization
  - TestImageNodeEmitsGitRef: stubbed _image_metadata + verify the
    triple lands on the image IRI (no Docker daemon required)

The rtm:gitRef property is declared formally in c8 alongside the rest
of the WP4 ontology additions; this commit uses the IRI directly.

Closes part of issue #4 (preparation for the reproduce CLI in c9).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c4. Adds the "storage remote" half of the three-remote provenance
chain: when persisting to Flexo (or Fuseki), every rtm:DockerImage
gains a rtm:flexoRecord pointer to where in the storage backend its
record lives.

- StoreBackend.record_uri(layer) Protocol method
- LocalBackend.record_uri() returns None (no remote)
- FlexoBackend.record_uri(layer) -> urn:adcs:flexo:<org>/<repo>/<branch>
- FuskeiBackend.record_uri(layer) -> urn:adcs:fuseki:<encoded-url>/<layer>
- Runner Stage 4 attaches rtm:flexoRecord to the image after emit_image_node,
  only when the store backend exposes a non-None record_uri.

LocalBackend runs unchanged (no triple added). Tests added in
test_backends.py for all three shapes; 21 backend tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c5. Distinguishes the image (static artifact), container
(transient materialization), and host (location) as three first-class
entities with standard PROV edges between them.

- ExecutionMetadata.container_uri() -> urn:adcs:docker-container:<id>
  (None for local runs or missing container_id)
- _bind_execution_metadata accepts image_iri; when container_uri is
  non-None, emits:
    <container> a rtm:DockerContainer, prov:Entity ;
                rtm:containerId "<id>" ;
                prov:wasDerivedFrom <image> ;
                prov:startedAtTime / endedAtTime "..."
    <activity> prov:used <container>
- bind_proof_evidence / bind_simulation_evidence thread image_iri
  through to the metadata helper.
- No change to existing PROV edges; new edges are purely additive.

Tests: TestContainerEntity in test_compute.py (4 cases — local skip,
docker emission, image link, missing-id sentinel). All 17 targeted
tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c6. Adds the "under whose authority?" axis to the provenance
chain without pulling in FOAF or W3C Org Ontology (those stay
deferred per CLAUDE.md future-work #2). Two org IRIs per run:

- operating org: who runs the container/authors the work
  (default urn:adcs:org:local-operator)
- hosting org:   who operates the substrate (host + Docker daemon)
  (default: same as operating)

Both env-configurable via ADCS_{OPERATING,HOSTING}_ORG_IRI; defaults
play "single-operator local" so existing runs don't change.

New edges in evidence/binding.py:
  <container> prov:wasAttributedTo <operating-org>
  <host>      rtm:operatedBy       <hosting-org>
  <executor>  prov:actedOnBehalfOf <operating-org>

Both prov:Organization typings + rdfs:labels emitted to <adcs:context>
at startup via compute/organizations.py::emit_org_nodes.

PipelineState gains operating_org_iri + hosting_org_iri fields.
bind_proof_evidence / bind_simulation_evidence gain corresponding
kwargs threaded through to _bind_execution_metadata.

The rtm:operatedBy predicate is declared formally in c8 alongside
the rest of the ontology additions; this commit uses it directly.

Full pytest: 231 passed (up from 219 baseline; no regressions).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mzargham and others added 21 commits May 28, 2026 18:15
WP4 c7. The SHACL closure-rule check is an automated, fully-specified
outcome — wraps as an earl:Assertion so the technical-trust witness
is queryable RDF, beside the existing human-attestation witness
(rtm:Attestation, which also subclasses earl:Assertion).

- new traceability/closure_assertion.py::emit_closure_assertion()
- Stage 6.5 in pipeline/runner.py calls it after verify()
- assertion typed rtm:ClosureRuleAssertion + earl:Assertion + prov:Activity
- carries earl:outcome (passed/failed), earl:mode (automatic),
  earl:test, earl:subject, prov:wasAssociatedWith, prov:atTime,
  rtm:violationCount
- one assertion per run (Q9: per-run granularity, not per-shape)
- compute.reproduce-side rtm:DigestMatchAssertion lands in c9 with the CLI

Discipline: earl:mode is always earl:automatic — verification, not
validation. Human attestation continues to use earl:manual / earl:semiAuto.

Test: test_audit::test_closure_assertion_emitted_into_audit_graph.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ovenance

WP4 c8. Formally declares every WP4 term used in commits c3-c7,
adds the four new SHACL closure rules, regenerates rtm.ttl, and
bumps the triple-count budget to accommodate the additions.

New classes:
- rtm:DockerContainer (subClassOf prov:Entity) — transient materialization
- rtm:DigestMatchAssertion (subClassOf earl:Assertion + prov:Activity)
- rtm:ClosureRuleAssertion (subClassOf earl:Assertion + prov:Activity)

New properties:
- rtm:containerId (on DockerContainer)
- rtm:gitRef (on DockerImage; xsd:anyURI)
- rtm:flexoRecord (on DockerImage; ObjectProperty)
- rtm:operatedBy (on prov:Location; subPropertyOf prov:wasAttributedTo)
- rtm:violationCount (on ClosureRuleAssertion)
- rtm:transactionId (on prov:Activity) — for service-invocation wire logs
- rtm:documentRef (on rtm:Evidence; xsd:anyURI) — for txnlog evidence

New SHACL shapes (rtm_shapes.ttl):
- DockerImageProvenanceShape — every DockerImage MUST have rtm:gitRef
- DockerContainerShape — every DockerContainer MUST have wasDerivedFrom
  exactly one DockerImage + rtm:containerId
- OrganizationAuspicesShape — DockerContainer SHOULD declare
  prov:wasAttributedTo a prov:Organization (Warning, not Violation)
- TransactionLogShape — Evidence with rtm:documentRef MUST also have
  rtm:contentHash + prov:wasGeneratedBy

Side effects:
- Bumped triple budget 380 → 450 (218 used; 232 headroom for WP5)
- emit_image_node now types rtm:gitRef as xsd:anyURI literal so it
  satisfies the new shape's datatype constraint
- test_docker_evidence_with_image_link_passes fixture updated to
  emit rtm:gitRef on its synthetic image

Full pytest: 232 passed (no regressions).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c9. New compute/reproduce.py Typer app that closes the
reproducibility loop: given an rtm:DockerImage record, clone the
recorded git ref, rebuild from compute/Dockerfile, and compare the
resulting runtime digest to the recorded rtm:contentHash.

CLI:
  uv run python -m compute.reproduce \
      --image-digest sha256:... \
      --from-trig output/rtm.trig

Output:
  - prints PASS/FAIL with detail
  - emits rtm:DigestMatchAssertion (earl:Assertion + prov:Activity)
    into <adcs:audit> with earl:outcome + earl:mode=earl:automatic +
    earl:subject=<image-iri> + prov:wasAssociatedWith=<reproduce-cli-agent>
  - exit 0 on match, 1 on mismatch, 2 on prerequisite failure

Pure logic split out as testable units (parse_git_ref,
load_image_record, emit_digest_match_assertion) so 11 unit tests
cover the orchestration without needing Docker. The actual
clone+build subprocess loop is exercised opt-in via -m live.

Honors the verification/validation discipline: earl:mode is always
earl:automatic for these (automated check).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c10. Adds the fourth service in the three-remote story: a
CouchDB-backed transaction-log store running in its own container
with its own URI (urn:adcs:service:transaction-log-store) and its
own hosting auspices.

- pipeline/backends/txnlog.py — minimal CouchDB client:
  - probe() HEADs the db, creates on 404, surfaces auth failures
  - put_document() PUTs JSON; 409 conflict treated as idempotent success
  - get_document() — readback path for the trust-query renderer
- Env config: ADCS_TXNLOG_{URL,DB,USER,PASSWORD}
- 8 unit tests in test_txnlog.py via httpx.MockTransport

Bonus (same commit, single-line surface area): security hardening
in compute/reproduce.py per automated security review. Git refs
come from RDF stores that may be partly trust-boundary'd, so:
- reject base/sha components starting with '-' (flag smuggling)
- require base to start with https:// / ssh:// / git@
- add '--' end-of-options sentinel to git clone + git checkout

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c11. Context manager that wraps a service invocation, captures
request/response, redacts secrets, PUTs the JSON to the txnlog store,
and emits RDF in <adcs:audit>:

  <activity> a prov:Activity ;
             rtm:transactionId "<id>" ;
             prov:wasAssociatedWith <caller> ;
             prov:used <service> ;
             prov:startedAtTime/endedAtTime "<iso>"
  <evidence> a rtm:Evidence ;
             rtm:contentHash "sha256:<hash>" ;
             rtm:documentRef <store-url> ;
             prov:atLocation <txnlog-service-iri> ;
             prov:wasGeneratedBy <activity>

Redaction allowlists:
  Headers: Authorization, Cookie, Set-Cookie, X-Api-Key, X-Auth-Token,
           Proxy-Authorization
  Body keys: password, passwd, token, secret, api_key, apikey,
             access_token, refresh_token

When store=None (e.g. --backend=local without txnlog), the activity
is still recorded but the evidence node is skipped — the
TransactionLogShape requires rtm:documentRef+rtm:contentHash, so
emitting the evidence without them would fail closure.

Robustness: a store.put_document failure does NOT propagate; the
wrapped service call's outcome is preserved. Exceptions inside the
context block are recorded in the document AND re-raised.

7 unit tests in test_transaction_log.py via FakeStore stand-in.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…12 part 1)

Plumbs the optional TxnLogBackend through the runtime so future PRs
can wrap individual service calls without further state surgery.

- PipelineState gains txnlog_store: TxnLogBackend | None
- ADCS_TXNLOG_ENABLED=1 env gate constructs a TxnLogBackend and
  runs it through _run_preflight alongside compute + storage probes
- Preflight banner prints the txnlog describe() line when enabled
- Existing runs (no env var set) are unchanged — txnlog_store is
  None, preflight skips that probe

Per-call wrapping (FlexoBackend HTTP / DockerCompute subprocess /
reproduce CLI subprocess) is deferred to a focused follow-up PR;
those changes require more surgery in the backend bodies and have
a small self-referential gotcha (FlexoBackend.persist would log
into <adcs:audit> while persisting it). The plumbing landed here
unblocks that work without bundling its risk into WP4.

95 targeted tests pass; no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c13. Operationalizes the "how can I trust this evidence?"
technical-trust panel as queryable SPARQL helpers.

Six functions, six typed @DataClass(frozen=True) records:
- technical_provenance(ds, evidence_iri) -> TechnicalProvenance
- reproducibility_witnesses(ds, image_iri) -> list[DigestWitness]
- closure_witnesses(ds, graph_iri) -> list[ClosureWitness]
- auspices_chain(ds, evidence_iri) -> AuspicesChain
- service_invocations_for(ds, ...) -> list[ServiceInvocationRow]
- trust_summary(ds, evidence_iri) -> TrustSummary

Plus render_trust_summary() — compact text rendering for
interrogate.explain Trust panel.

All queries are read-only, use OPTIONAL for graceful partial matches
(local-compute runs have no container/image but still produce a
useful technical row), and return typed records callers can pass
without re-querying.

Tests: 8 cases on a nominal local+local pipeline run; cover the
empty + populated paths for each query.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c14. Adds an env-driven branch-name prefix to FlexoBackend so
each pipeline run can land in its own scoped branch space (e.g.
cert/2026-06-12-001/evidence) without forcing the pattern on the
default single-canonical-state run.

- _branch_id(graph_iri, prefix="") prepends the prefix
- FLEXO_BRANCH_PREFIX env (default "") is read in __init__
- branch_prefix kwarg overrides env
- record_uri() honors the prefix so rtm:flexoRecord points at the
  correctly-scoped branch IRI

Unchanged: empty default means existing runs land in
adcs-demo/lifecycle/<layer> exactly as today.

Test: test_flexo_backend_branch_prefix_applies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c15. Provisioning scripts for the local txnlog store
(CouchDB), so a no-experience operator can bring the canonical
multi-remote stack up with one command.

- tools/start-services.sh — idempotent docker run + db ensure;
  waits for CouchDB readiness; prints the env-var block to export
- tools/stop-services.sh — symmetric teardown; --purge wipes data
- docker-compose.yml — same shape for users who prefer compose

Both paths use the same container name (couchdb-adcs) and default
credentials (adcs/adcs); env-var overrides documented in the
scripts and in .env.example (lands in c16).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WP4 c16. The architecture demo gets a self-contained ARCHITECTURE.md
that a new collaborator can read cold; README + CLAUDE.md get the
three-remote + fourth-service entry points; .env.example documents
every env var with defaults.

- ARCHITECTURE.md (new): three-remote diagram + URI scheme table +
  full provenance-chain example + EARL outcome section + trust query
  list + reproducibility loop + preflight gate semantics
- README.md: new Setup section (.env + tools/start-services.sh +
  preflight fail-fast), new Canonical multi-remote run subsection
  under Quick Start, new Reproducibility verification subsection,
  ARCHITECTURE.md pointer at the top
- CLAUDE.md: new "Three-remote architecture (WP4)" section under
  named-graph layout — concise URI scheme summary + EARL outcomes +
  preflight + trust queries
- .env.example (new): every FLEXO_* / ADCS_TXNLOG_* / ADCS_*_ORG_*
  variable with documented defaults

Full pytest after sweep: 267 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
16 commits making the four-service architecture (git + Flexo + local
Docker + CouchDB txnlog) operational rather than narrated, with
preflight gating, organizational auspices, EARL-wrapped automated
outcomes, six trust queries, and the reproducibility CLI.

c1.  feat(backends): probe() on StoreBackend + 3 implementations
c2.  feat(compute): probe() on ComputeBackend + preflight gate
c3.  feat(compute): capture git ref + emit rtm:gitRef on rtm:DockerImage
c4.  feat(backends): record_uri() + emit rtm:flexoRecord
c5.  feat(evidence): emit rtm:DockerContainer entity + prov:used edge
c6.  feat(provenance): organizational auspices via prov:Organization
c7.  feat(audit): emit rtm:ClosureRuleAssertion from Stage 6.5
c8.  feat(ontology): WP4 classes + properties + shapes
c9.  feat(compute): reproduce CLI + rtm:DigestMatchAssertion
c10. feat(backends): TxnLogBackend (CouchDB) + reproduce hardening
c11. feat(traceability): TransactionLogger + wire-logs as rtm:Evidence
c12. feat(runner): wire txnlog store into PipelineState + preflight
c13. feat(traceability): six trust queries + render_trust_summary
c14. feat(backends): optional FLEXO_BRANCH_PREFIX
c15. chore(tools): start-services.sh + stop-services.sh + docker-compose.yml
c16. docs: ARCHITECTURE.md + README + CLAUDE.md + .env.example

Companion issues filed: #7 (PU registry), #8 (RIME services),
#9 (Starforge oracles). Issue #4 remains open for the WP5 narrative
items (audit module image surfacing + notebook Act 9 update).

Full pytest: 267 passed, 5 skipped, 3 deselected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes one of two residual issue #4 ACs deferred from WP3. The
audit report's markdown now includes a "Docker image provenance"
table for any run that emitted at least one rtm:DockerImage,
listing image IRI, digest, git ref, and count of evidence nodes
derived from it.

- DockerProvenanceRow dataclass (frozen)
- docker_provenance(ds) -> list[DockerProvenanceRow] SPARQL helper
- AuditReport gains docker_provenance: list[DockerProvenanceRow]
- audit() populates it via the SPARQL query
- _render_markdown adds the new section between coverage matrix
  and orphans, omitted entirely when the list is empty

Local-compute runs see no change (empty list = section omitted).
Docker-compute runs see the image surfaced beside the audit
direction summary, where an auditor reading the report can trace
"what produced what" without leaving the report.

Tests: 2 new in test_audit.py — empty path + populated path with
a synthesized image. 18 audit tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes the second of two residual issue #4 ACs deferred from WP3.
A new cell after the existing Act 9 narration synthesizes the
rtm:DockerImage node (mimicking what DockerCompute._emit_image_node
does in a live --compute=docker run) with WP3 properties + WP4
extensions (rtm:gitRef, rtm:flexoRecord), then runs the WP3
evidence_by_image SPARQL helper against the augmented dataset to
show the inverse query the executor-label model couldn't answer.

The cell:
- emits a synthetic rtm:DockerImage with content hashes + git ref
  + flexo record cross-link
- wires synthesized v2 evidence to derive from it
- runs evidence_by_image() + interpolates the count into the markdown
- explains the reproducibility loop (compute.reproduce + EARL outcome)

The original Act 9 cell (executor-agent label) stays intact above —
the new cell extends rather than replaces, so the narrative reads:
"here's the executor label (today's model); here's the image as a
node (WP3 + WP4); here's the inverse query that becomes possible."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Disposable one-page reconciliation showing every claim in
openbee_dsg_opener.pptx Slide 5 (Demo #1) backed by a concrete
code receipt. Reviewer can trace "Flexo Deployment + oracles +
evidence reproducibility with git hashes and docker" to specific
modules, commits, and tests.

Notes that WP4 exceeded the slide's promise on the "with git
hashes and docker" claim — container-as-entity, organizational
auspices, wire-level audit trail, six trust queries are all
additive surface beyond what was advertised.

Cross-links the three companion issues (#7 PU registry,
#8 RIME services, #9 Starforge oracles) as the "what's next"
deliverable for Planetary Utilities' team.

Pages auto-publishes the marimo notebook export at
dynamicalsystemsgroup.github.io/ADCS-lifecycle-demo — the WP5 c2
notebook update will flow on next push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ging

Three commits closing the residual narrative items from WP3 (issue #4
ACs deferred to WP5) and the slide-reconciliation pass.

c1. feat(audit): surface Docker image identity in audit summary
    - DockerProvenanceRow + docker_provenance() + AuditReport extension
    - _render_markdown adds "Docker image provenance" table for
      --compute=docker runs; section omitted for local runs
    - 2 new tests covering empty + populated paths

c2. docs(notebook): Act 9 narrative now shows rtm:DockerImage as a node
    - New cell after the executor-label cell synthesizes rtm:DockerImage
      with WP3 + WP4 properties (contentHash, gitRef, flexoRecord)
    - Wires synthesized v2 evidence to derive from it
    - Runs evidence_by_image() SPARQL helper in-cell + interpolates
      the count into the markdown
    - Explains compute.reproduce + EARL DigestMatchAssertion

c3. docs: RECONCILIATION.md — slide claims ↔ code receipts
    - Maps every "Flexo Deployment, oracles & evidence reproducibility
      with git hashes and docker" phrase to its code receipt
    - Cross-links companion issues #7/#8/#9 (PU / RIME / Starforge)
    - Notes WP4 *exceeded* the slide's promise on the docker axis

End-of-roadmap alignment review (all in this merge):
- Discipline sweep: validate-vs-verify clean (only legitimate uses + the
  one back-compat alias from WP1 §10)
- openCAESAR sweep: zero hits
- ROBOT_OPTIONAL sweep: zero hits
- ValidateShapes IRI fragment: preserved per #6 known follow-up
- Issues #2 + #3 retroactively closed with status comments
- Issue #4 ready to close (this merge lands the residual 2 ACs)
- Issues #5, #6, #7, #8, #9 correctly remain open (deferred + future-work)
- End-to-end smoke: pipeline runs cleanly; 1084 union triples
- Full pytest: 269 passed

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI's "Confirm rtm.ttl is committed in-sync with rebuild" gate was
failing because the build_time was wall-clock-now, so every rebuild
produced a different sha256 → diff against committed copy → fail.

_reproducible_build_time() resolves the timestamp in this order:
1. SOURCE_DATE_EPOCH env var (Reproducible Builds standard).
2. Most-recent git-commit time of the build inputs (rtm-edit.ttl,
   sysml_term_map.csv, build_ontology.py). Stable across CI + local.
3. datetime.now() — unreproducible fallback for bootstrap.

After this commit lands, the next regen of ontology/rtm.ttl +
assembly_manifest.json (separate commit) will pin the timestamp to
THIS commit's ct; future CI rebuilds will compute the same value
and produce byte-identical artifacts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…d_time

Pins the artifact's build_time to the prior commit's ct (where
build_ontology.py + the ontology inputs were last touched), via
the new _reproducible_build_time() in scripts/build_ontology.py.
CI rebuilds will now produce byte-identical artifacts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…* refs

Three changes addressing reviewer feedback:

1. Removed all WP1..WP5 mentions from notebook prose. The narrative
   describes capabilities, not implementation phases.

2. New cell — "Many Authoritative Sources of Truth, one stitched
   provenance graph" — acknowledges the diverse ASoTs (SysMLv2 model,
   symbolic + numerical oracles, Docker image, engineer attestation,
   closure-rule check, audit module), shows what each holds and what
   grounds its trust, and frames the demo's contribution as the
   integration that stitches them via standard PROV/EARL/GSN edges
   without overloading anyone's vocabulary.

3. Renamed the docker-as-evidence cell to "The runtime ASoTs as
   first-class nodes" and expanded it to show the container entity
   (rtm:DockerContainer, prov:used edge from analysis activity) so
   the materialization story is visible inline. Reports both
   proof-artifact AND simulation-result counts from the
   evidence_by_image inverse query.

4. New "Numerical evidence — the full provenance, end-to-end" cell
   uses trust_summary + render_trust_summary against EV-SIM-REQ-001-v2
   to render the complete chain (oracle → activity → container →
   image → git ref → host → org → closure assertion) in one block,
   so readers see the multi-ASoT stitch concretely rather than as
   abstract narration.

The Pages workflow regenerates output/notebook.html on next push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
actions/checkout@v4 defaults to fetch-depth=1, which makes
`git log -- <files>` return empty for any file HEAD didn't
touch directly. _reproducible_build_time() then falls back to
datetime.now() and produces non-reproducible artifacts — the
CI diff-check fails on the very build the timestamp fix was
supposed to enable.

fetch-depth: 0 fetches full history so the commit that last
touched the inputs resolves consistently across CI + local.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-specific

CI runs 'make ontology' (with ROBOT) which writes robot_used: true
and a ROBOT-specific notes string into assembly_manifest.json.
Local contributors without Java run 'make ontology-python' which
writes robot_used: false. The manifest's build-path provenance
fields are intentionally asymmetric.

The load-bearing artifact is rtm.ttl, which IS reproducible thanks
to _reproducible_build_time(); that's what the gate checks now.
Manifest stays committed but isn't diff-gated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…wrapping

Three failing CI tests:
- test_pipeline_runner_help_lists_known_flags
- test_rerun_help_lists_known_flags
- test_reproduce_cli_help

All check for flag substrings ("--auto" etc) in `result.stdout`.
Rich's Typer help renderer wraps the flag column to terminal width;
CI runners are narrower than dev workstations, which can split
"--auto" across a wrap boundary so it's no longer a contiguous
substring of the rendered output.

Fix: strip ANSI escape codes + collapse whitespace before the
substring match. _flatten_help() in test_cli.py; same regex inline
in test_reproduce.py (kept local to avoid an import cycle).

18 targeted tests pass locally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mzargham mzargham merged commit 58a1dba into main May 29, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant