-
Notifications
You must be signed in to change notification settings - Fork 281
feat: add azd CLI evaluation and testing framework #7202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
spboyer
wants to merge
8
commits into
main
Choose a base branch
from
feat/eval-framework
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+7,314
−0
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
d0487f3
feat: add azd CLI evaluation and testing framework
spboyer 6538419
docs: add authentication and secrets section to eval README
spboyer 88d741b
docs: add comprehensive how-to guides for creating evals, graders, an…
spboyer 8c5d1d0
fix: resolve CI failures in eval unit tests and cspell
spboyer bb87930
fix: stop command-sequencing tests from overriding AZD_CONFIG_DIR
spboyer 5f2f24e
docs: expand auth section with subscription config and no-popup guara…
spboyer 1767fee
refactor: address review feedback from @jongio and Copilot
spboyer ca4689f
fix: address round 2 review feedback from @jongio
spboyer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| name: "Eval: E2E Lifecycle" | ||
|
|
||
| on: | ||
| schedule: | ||
| # 6am UTC Monday | ||
| - cron: "0 6 * * 1" | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| id-token: write | ||
| contents: read | ||
|
|
||
| jobs: | ||
| e2e-lifecycle: | ||
| runs-on: ubuntu-latest | ||
| env: | ||
| AZURE_ENV_NAME: eval-e2e-${{ github.run_id }} | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-go@v5 | ||
| with: | ||
| go-version-file: "cli/azd/go.mod" | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Build azd | ||
| working-directory: cli/azd | ||
| run: go build -o ./azd . | ||
|
|
||
| - name: Add azd to PATH | ||
| run: echo "${{ github.workspace }}/cli/azd" >> "$GITHUB_PATH" | ||
|
|
||
| - name: Azure Login (OIDC) | ||
| uses: azure/login@v2 | ||
| with: | ||
| client-id: ${{ secrets.AZURE_CLIENT_ID }} | ||
| tenant-id: ${{ secrets.AZURE_TENANT_ID }} | ||
| subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} | ||
|
|
||
| - name: Install Waza CLI | ||
| run: npm install -g waza | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Run lifecycle evaluations | ||
| working-directory: cli/azd/test/eval | ||
| continue-on-error: true | ||
| env: | ||
| COPILOT_CLI_TOKEN: ${{ secrets.COPILOT_CLI_TOKEN }} | ||
| AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }} | ||
| run: waza run --executor copilot-sdk --filter "tasks/lifecycle/" | ||
|
|
||
| - name: Upload E2E results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: e2e-results-${{ github.run_id }} | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 30 | ||
|
|
||
| - name: Cleanup Azure resources | ||
| if: always() | ||
| working-directory: cli/azd/test/eval | ||
| run: | | ||
| cd /tmp | ||
| azd down --purge --force --no-prompt 2>/dev/null || true | ||
| env: | ||
| AZURE_ENV_NAME: eval-e2e-${{ github.run_id }} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| name: "Eval: Weekly Report" | ||
|
|
||
| on: | ||
| schedule: | ||
| # 8am UTC Monday, after E2E completes | ||
| - cron: "0 8 * * 1" | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: read | ||
| actions: read | ||
|
|
||
| jobs: | ||
| generate-report: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Download recent Waza artifacts | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| mkdir -p cli/azd/test/eval/reports/waza | ||
| RUN_ID=$(gh api repos/${{ github.repository }}/actions/workflows/eval-waza.yml/runs \ | ||
| --jq '.workflow_runs | map(select(.conclusion == "success")) | .[0].id // empty' 2>/dev/null) | ||
| if [ -n "$RUN_ID" ]; then | ||
| gh run download "$RUN_ID" -D cli/azd/test/eval/reports/waza 2>/dev/null || echo "No waza artifacts found" | ||
| else | ||
| echo "No successful waza runs found, skipping" | ||
| fi | ||
|
|
||
| - name: Download recent E2E artifacts | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| mkdir -p cli/azd/test/eval/reports/e2e | ||
| RUN_ID=$(gh api repos/${{ github.repository }}/actions/workflows/eval-e2e.yml/runs \ | ||
| --jq '.workflow_runs | map(select(.conclusion == "success")) | .[0].id // empty' 2>/dev/null) | ||
| if [ -n "$RUN_ID" ]; then | ||
| gh run download "$RUN_ID" -D cli/azd/test/eval/reports/e2e 2>/dev/null || echo "No e2e artifacts found" | ||
| else | ||
| echo "No successful e2e runs found, skipping" | ||
| fi | ||
|
|
||
| # TODO: Implement report generation script (scripts/generate-report.ts) | ||
| # that diffs Waza result JSON files and produces regression-issues.json. | ||
| # Once implemented, add a step to create GitHub issues from regressions. | ||
|
|
||
| - name: Upload aggregated artifacts | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: eval-weekly-report-${{ github.run_id }} | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 90 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: "Eval: Unit Tests" | ||
|
|
||
| on: | ||
| pull_request: | ||
| paths: | ||
| - "cli/azd/test/eval/**" | ||
| - "cli/azd/internal/mcp/**" | ||
| - "cli/azd/cmd/mcp.go" | ||
| - "cli/azd/cmd/root.go" | ||
|
|
||
spboyer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| unit-tests: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-go@v5 | ||
| with: | ||
| go-version-file: "cli/azd/go.mod" | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Build azd | ||
| working-directory: cli/azd | ||
| run: go build -o ./azd . | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Run unit tests | ||
| working-directory: cli/azd/test/eval | ||
| run: npm run test:unit -- --ci | ||
|
|
||
spboyer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - name: Validate Waza task YAML | ||
| working-directory: cli/azd/test/eval | ||
| run: npm run waza:validate | ||
| continue-on-error: true | ||
|
|
||
| - name: Upload test results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: eval-unit-results | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 30 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| name: "Eval: Waza Runs" | ||
|
|
||
| on: | ||
| schedule: | ||
| # 5am, 12pm, 8pm UTC, Tuesday through Saturday | ||
| - cron: "0 5,12,20 * * 2-6" | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| waza-run: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-go@v5 | ||
| with: | ||
| go-version-file: "cli/azd/go.mod" | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Build azd | ||
| working-directory: cli/azd | ||
| run: go build -o ./azd . | ||
|
|
||
| - name: Add azd to PATH | ||
| run: echo "${{ github.workspace }}/cli/azd" >> "$GITHUB_PATH" | ||
|
|
||
| - name: Install Waza CLI | ||
| run: npm install -g waza | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Run Waza evaluations | ||
| working-directory: cli/azd/test/eval | ||
| continue-on-error: true | ||
| env: | ||
| COPILOT_CLI_TOKEN: ${{ secrets.COPILOT_CLI_TOKEN }} | ||
| run: waza run --executor copilot-sdk | ||
|
|
||
| - name: Upload Waza results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: waza-results-${{ github.run_id }} | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 30 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| node_modules/ | ||
| dist/ | ||
| reports/*.json | ||
| reports/*.md | ||
| reports/junit.xml | ||
| !reports/.gitkeep |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this from github-action to Azdo internal pipeline? - Is there a hard dependency for this to be gh-action?
It's been hard in the past to get secrets like this added to our public repo, and the strategy is to use the internal Azdo pipelines.
@danieljurek FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do run some github actions on a set of 1ES runners. I'm checking with @weshaggard about how to handle permissions for GH Actions.