Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions .github/ISSUE_TEMPLATE/model-verification.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
name: ✅ Model Verification Report
description: Certify that an AI model works end-to-end in SmartHopper so it can be promoted to Verified=true.
title: "[verify] <provider>/<model>"
labels: ["model-verification", "status: needs triage"]
assignees: []
body:
- type: markdown
attributes:
value: |
## How model verification works

Models declared in each `*ProviderModels.cs` file are flagged with `Verified = true/false`.
A model is promoted to `Verified = true` once **two distinct users** have submitted a
successful verification report for the same `provider/model` pair.

- The author of this issue counts as the **first verifier** if the required tests are checked below.
- Other users certify the same model by posting a comment that **starts with `/verify-confirm`**
and copies the codeblock shown below (one comment per user).
- Organization members and collaborators may post a comment that **starts with `/verify-force`**
to immediately promote the model regardless of the user count.

Please only check tests that are **declared in the model's `Capabilities`** in the corresponding
`*ProviderModels.cs` file. Tests for capabilities the model does not advertise should be left
unchecked.

### Comment template for additional verifiers

Copy the codeblock below into a new comment, tick the boxes for the tests you successfully ran,
and submit. **Do not remove the `/verify-confirm` line or the HTML marker** — the workflow
will ignore comments that don't match this exact preamble.

````markdown
/verify-confirm
<!-- model-verification-confirm -->

- [ ] **C1** — `AITextGenerate` (Text2Text)
- [ ] **C2** — `AITextListGenerate` (Text2Json)
- [ ] **C3** — `AIImgToText` (Image2Text)
- [ ] **C4** — `AIImgGenerate` (Text2Image)
- [ ] **C5** — Audio component (Speech2Text / Text2Speech)
- [ ] **B1** — Streaming in WebChat
- [ ] **B2** — ToolChat / FunctionCalling in WebChat
- [ ] **B3** — Reasoning in WebChat
- [ ] **B4** — Multi-turn `ConversationSession` in WebChat

I personally ran the ticked tests against this provider/model on SmartHopper <version> on <OS>.
````

> Replace `<version>` and `<OS>` with the actual values you used.

- type: dropdown
id: provider
attributes:
label: Provider
description: Which provider does this model belong to? Must match the folder name under `src/SmartHopper.Providers.*`.
options:
- Anthropic
- DeepSeek
- MistralAI
- OpenAI
- OpenRouter
validations:
required: true

- type: input
id: model
attributes:
label: Model name
description: Exact `Model` string as declared in the provider's `*ProviderModels.cs` (e.g. `gpt-5-mini`, `claude-haiku-4-5`, `mistralai/mistral-medium-3.1`).
placeholder: e.g. mistral-medium-latest
validations:
required: true

- type: input
id: smarthopper-version
attributes:
label: SmartHopper Version
placeholder: e.g. 1.4.2-beta
validations:
required: true

- type: dropdown
id: os
attributes:
label: Operating System
options:
- Windows
- macOS
validations:
required: true

- type: checkboxes
id: tests-canvas
attributes:
label: "Tests — Components on the Grasshopper canvas"
description: |
Place each component on the canvas, feed it the **exact prompt below**, run it, and tick the box only if the output is coherent with the prompt. Skip tests for capabilities the model does not declare.
options:
- label: |
**C1 — `AITextGenerate` (Text2Text).**
Prompt: `List three structural advantages of triangulated trusses, one sentence each.`
Verify the output is three coherent sentences about trusses.
- label: |
**C2 — `AITextListGenerate` (Text2Json).**
Prompt: `Give me five common Grasshopper component categories.`
Verify the output is a list of exactly five plausible category names.
- label: |
**C3 — `AIImgToText` (Image2Text).**
Feed any image you control. Prompt: `Describe what you see in one sentence.`
Verify the description matches the contents of the image.
- label: |
**C4 — `AIImgGenerate` (Text2Image).**
Prompt: `A red cube on a white background, isometric, flat shading.`
Verify a valid image is produced and roughly matches the prompt.
- label: |
**C5 — Audio components (Speech2Text / Text2Speech).**
For audio-capable models only. Run an audio component with a short clip / short sentence and verify the transcription or synthesized speech is correct.
- type: checkboxes
id: tests-chat
attributes:
label: "Tests — Chat interface (`AIChat` / WebChat)"
description: |
Open the WebChat (the AIChat component) configured for this provider/model, type the **exact prompt below**, and tick the box only if the behavior described is observed.
options:
- label: |
**B1 — Streaming.** For streaming-capable models only.
In chat, type: `Write a three-sentence haiku about Rhino 3D.`
Verify tokens appear progressively in the UI rather than in a single block.
- label: |
**B2 — ToolChat (FunctionCalling).**
In chat, type: `Use gh_report to summarize the current canvas.`
Verify the model invokes the `gh_report` tool and the chat shows its output.
- label: |
**B3 — Reasoning / ReasoningChat.** For reasoning-capable models only.
In chat, type: `If I have 17 components and each must connect to two distinct others without forming any cycle, what is the minimum total number of connections? Show your reasoning step by step.`
Verify a coherent step-by-step reasoning is shown and the final answer is correct (16).
- label: |
**B4 — Multi-turn `ConversationSession`.**
In the same chat, run at least three user turns including one tool call (e.g. ask for a `gh_report`, then a follow-up question, then another tool call).
Verify the conversation completes without errors and the chat shows aggregated turn metrics.
validations:
required: true

- type: textarea
id: evidence
attributes:
label: Evidence
description: |
Paste short logs, screenshots, or a Grasshopper file snippet that demonstrates the tests above.
At minimum, include the model's reported `tokens_in`/`tokens_out` for one successful call.
validations:
required: true

- type: textarea
id: notes
attributes:
label: Notes / observations
description: Anything reviewers should know — quirks, partial failures, recommended `Default` capability flags, suggested `Rank`, etc.
validations:
required: false

- type: checkboxes
id: confirm
attributes:
label: Confirmation
options:
- label: I confirm that I personally ran the tests above against the specified `provider/model` on the specified SmartHopper version.
required: true
4 changes: 4 additions & 0 deletions .github/workflows/chore-update-contributors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ permissions:
contents: write
pull-requests: write

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false

jobs:
update-contributors:
runs-on: ubuntu-latest
Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/chore-version-badge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,19 @@ on:
- dev
- 'hotfix/**'
- 'release/**'
# Only run when the version source of truth changes. README.md is an output of this
# workflow; without this filter every PR merge to main/dev would re-trigger it.
paths:
- 'Solution.props'

permissions:
contents: write
pull-requests: write

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false

jobs:
paths-check:
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/chore-version-date.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ permissions:
contents: write
pull-requests: write

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false

jobs:
update-date:
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/chore-version-main-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ permissions:
contents: write
pull-requests: write

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: false

jobs:
remove-release-date:
name: 🔄 Remove Release Version Date
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/ci-dotnet-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ permissions:
contents: read
pull-requests: read

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
# Job 1: Windows-only prep step - generates SNK, updates InternalsVisibleTo in csproj files
# This ensures the public key is embedded in source files before cross-platform compilation
Expand Down
13 changes: 12 additions & 1 deletion .github/workflows/dev-update-manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ on:
permissions:
contents: write

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: false

jobs:
update-manifest:
name: 📋 Update Manifest Text for Dev Version
Expand Down Expand Up @@ -70,4 +74,11 @@ jobs:
git config --local user.name "github-actions[bot]"
git add yak-package/manifest.yml
git commit -m "chore: update manifest text to ${{ steps.update_manifest.outputs.release-type }} version [skip ci]"
git push
# Belt-and-braces against external commits landing between fetch and push.
# Retry pull --rebase + push up to 3 times in case of a near-simultaneous push.
for attempt in 1 2 3; do
git pull --rebase --autostash origin dev && \
git push origin HEAD:dev && break
echo "Push attempt $attempt failed, retrying..."
sleep $((attempt * 2))
done
4 changes: 4 additions & 0 deletions .github/workflows/github-issue-labels-close.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ on:
permissions:
issues: write

concurrency:
group: issue-labels-${{ github.event.issue.number }}
cancel-in-progress: false

jobs:
close-issue-on-label:
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/github-issue-labels-on-close.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ on:
permissions:
issues: write

concurrency:
group: issue-labels-${{ github.event.issue.number }}
cancel-in-progress: false

jobs:
update-issue-labels:
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/github-labels-sync.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ on:
permissions:
issues: write

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false

jobs:
build:
runs-on: ubuntu-latest
Expand Down
Loading
Loading