[observability escalation] Smoke test workflows repeatedly exceed resource and control thresholds (Smoke Claude, Smoke Copilot)

Two smoke test workflows have crossed escalation thresholds 3 and/or 4 across their last two runs in the Mar 16–30 window. Because these are smoke tests — which should be lean, directed, and narrowly scoped — sustained `heavy` resource profiles and `write_heavy` actuation are structurally suspect.

### Workflows Requiring Attention

#### 1. Smoke Claude — thresholds 3 (resource_heavy ×2) and 4 (poor_control ×2)

Both runs completed successfully but consumed ~1.09 M tokens each ($0.77–$0.90 per run) over 9 minutes, with 28–33 turns. The behavior fingerprint is `exploratory`, `broad` tools, `write_heavy`, `heavy` resource profile.

- `resource_heavy_for_domain:high` — both runs ([§23730204448](https://github.com/github/gh-aw/actions/runs/23730204448), [§23730326948](https://github.com/github/gh-aw/actions/runs/23730326948))
- `poor_agentic_control:medium` — both runs
- `partially_reducible:medium` — both runs
- Second run classified `changed` with `turns_increase` against a cohort baseline

**Recommended action:** Review the Smoke Claude prompt and task scope. A smoke test should validate basic engine connectivity, not perform full exploratory triage. Consider splitting into a lightweight connectivity check (lean + read) and a separate heavier integration test if broad exploration is genuinely needed.

#### 2. Smoke Copilot — threshold 3 (resource_heavy ×2), and `poor_agentic_control:high` on most recent run

Both runs show `resource_heavy_for_domain:medium` despite `agentic_fraction=0` (no tracked Copilot turns), with `write_heavy` actuation and 6.8–7.0 min duration. The most recent run additionally carries `poor_agentic_control:high` — the highest control severity in the dataset.

- `resource_heavy_for_domain:medium` — both runs ([§23730204451](https://github.com/github/gh-aw/actions/runs/23730204451), [§23730326985](https://github.com/github/gh-aw/actions/runs/23730326985))
- `poor_agentic_control:high` — most recent run
- Missing tool: Serena MCP server (`activate_project`, `find_symbol`) — most recent run

**Recommended action:** Investigate why the workflow carries a write-heavy actuation profile even with `agentic_fraction=0`. Review whether the Serena MCP server dependency is intentional and, if so, add a graceful guard when it is unavailable (the tool was absent from the Actions environment).

### Evidence Summary

| Workflow | Threshold crossed | Severity | Runs |
|----------|------------------|----------|------|
| Smoke Claude | resource_heavy_for_domain (×2) | high | 2/2 |
| Smoke Claude | poor_agentic_control (×2) | medium | 2/2 |
| Smoke Copilot | resource_heavy_for_domain (×2) | medium | 2/2 |
| Smoke Copilot | poor_agentic_control | high | 1/2 |

### Suggested Route

`workflow:Smoke Claude` and `workflow:Smoke Copilot` — workflow owners should review prompt scope and execution posture. No broader orchestration chain involved (both are standalone episodes, `confidence=high`).

Full detail is in the linked observability discussion for Mar 16–30, 2026.







> Generated by [Agentic Observability Kit](https://github.com/github/gh-aw/actions/runs/23735925037/agentic_workflow) · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fagentic-observability-kit%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[observability escalation] Smoke test workflows repeatedly exceed resource and control thresholds (Smoke Claude, Smoke Copilot) #23528

Workflows Requiring Attention

1. Smoke Claude — thresholds 3 (resource_heavy ×2) and 4 (poor_control ×2)

2. Smoke Copilot — threshold 3 (resource_heavy ×2), and `poor_agentic_control:high` on most recent run

Evidence Summary

Suggested Route

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workflow	Threshold crossed	Severity	Runs
Smoke Claude	resource_heavy_for_domain (×2)	high	2/2
Smoke Claude	poor_agentic_control (×2)	medium	2/2
Smoke Copilot	resource_heavy_for_domain (×2)	medium	2/2
Smoke Copilot	poor_agentic_control	high	1/2

[observability escalation] Smoke test workflows repeatedly exceed resource and control thresholds (Smoke Claude, Smoke Copilot) #23528

Description

Workflows Requiring Attention

1. Smoke Claude — thresholds 3 (resource_heavy ×2) and 4 (poor_control ×2)

2. Smoke Copilot — threshold 3 (resource_heavy ×2), and poor_agentic_control:high on most recent run

Evidence Summary

Suggested Route

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

2. Smoke Copilot — threshold 3 (resource_heavy ×2), and `poor_agentic_control:high` on most recent run