refactor(redteam): extract report functions and add numeric sorting #7162

mldangelo · 2026-01-21T03:34:27Z

Summary

Follow-up improvements to PR #7157 (policy plugin display IDs):

Numeric sorting for policy numbers: Report now correctly sorts policy [Feature Request]More Results based on Same Value #1, [BUG]No tableOutput.html file in dist. #2, Renaming "__expected"s "eval" keyword #10, feat: read folders #12 instead of alphabetically ([Feature Request]More Results based on Same Value #1, Renaming "__expected"s "eval" keyword #10, feat: read folders #12, [BUG]No tableOutput.html file in dist. #2)
Extract report functions to dedicated module: Moves generateReport, getStatus, getPluginSeverity, getPluginBaseDisplayId, extractSortKey, sortReportIds to src/redteam/report.ts for better separation of concerns
Comprehensive unit tests: Adds 28 new tests for all extracted functions

Before (alphabetical sorting)

policy #1: "One"
policy #10: "Ten"
policy #12: "Twelve"
policy #2: "Two"

After (numeric sorting)

policy #1: "One"
policy #2: "Two"
policy #10: "Ten"
policy #12: "Twelve"

Test plan

All 28 new unit tests pass
Existing redteam/index.test.ts tests pass (108 tests)
Lint and format checks pass

🤖 Generated with Claude Code

- Add extractSortKey() to parse language prefix/suffix and policy number - Add sortReportIds() for numeric policy sorting (1,2,3... not 1,10,11,12,2...) - Sort by: policy number (numeric) → base ID → language - Add test verifying numeric sort order with 12 policies Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Move report generation functions (getStatus, extractSortKey, sortReportIds, generateReport) to new src/redteam/report.ts - Move plugin display functions (getPluginSeverity, getPluginBaseDisplayId) for better separation of concerns - Add comprehensive unit tests for all extracted functions - Improves testability and code organization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

use-tusk · 2026-01-21T03:34:37Z

⏩ No test execution environment matched (d2e0323) View output ↗

View check history

Commit	Status	Output	Created (UTC)
`ece40e6`	⏩ No test execution environment matched	Output	Jan 21, 2026 3:34AM
`d2e0323`	⏩ No test execution environment matched	Output	Jan 21, 2026 3:48AM

View output in GitHub ↗

github-actions · 2026-01-21T03:35:54Z

🔄 Security Review ✅

No issues in new changes.

The incremental changes add afterEach(() => vi.resetAllMocks()) to two test files, improving test isolation and preventing mock leakage between tests. This is a good practice that addresses previous reviewer feedback.

PR Convention Check

✅ PR touches src/redteam/ and correctly uses (redteam) scope in title per THE REDTEAM RULE.

Last updated: 2026-01-21T03:50:00Z | Reviewing: d2e0323 (incremental from ece40e6)

promptfoo-scanner

👍 All Clear

This PR refactors report generation functions into a dedicated module and adds numeric sorting for policy numbers. No LLM security vulnerabilities were identified in these changes.

_{Minimum severity threshold for this scan: 🟡 Medium | Learn more}

_{Was this helpful? 👍 Yes | 👎 No}

coderabbitai · 2026-01-21T03:41:35Z

📝 Walkthrough

Walkthrough

This pull request refactors report-generation utilities by extracting internal helper functions from src/redteam/index.ts into a new src/redteam/report.ts module. The extracted functions include getPluginSeverity, getPluginBaseDisplayId, getStatus, and generateReport. New sorting utilities (extractSortKey and sortReportIds) are introduced to enable deterministic report ordering, particularly for numeric policy number sorting. The index.ts file is updated to import these utilities from the new report module. Test coverage is added to verify the report generation, sorting, and status formatting logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the two main changes: extracting report functions to a dedicated module and implementing numeric sorting for policy numbers.
Description check	✅ Passed	The description is directly related to the changeset, providing clear context about the refactoring, numeric sorting improvements, and test coverage with before/after examples.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@test/redteam/index.test.ts`:
- Around line 3068-3107: Add a top-level afterEach cleanup inside the
describe('synthesize') block to reset mocks after each test: locate the
describe('synthesize') containing the it(...) tests (including the 'should sort
report rows...' test) and insert afterEach(() => vi.resetAllMocks()) so all vi
mocks/spies are cleared between tests (ensuring Plugins.find, logger.info, and
other mocks are reset).

In `@test/redteam/report.test.ts`:
- Around line 1-13: Add explicit vitest imports and a teardown to reset mocks:
update the import line that currently pulls describe, expect, it from 'vitest'
to also import afterEach and vi, then add a top-level afterEach(() => {
vi.resetAllMocks(); }) to the test file so that mocks are reset between tests;
this change affects the test helpers (stripAnsi) and tests that call functions
like extractSortKey, generateReport, getPluginBaseDisplayId, getPluginSeverity,
getStatus, and sortReportIds.

🧹 Nitpick comments (1)

src/redteam/report.ts (1)
100-123: Verify ordering between policy and non‑policy IDs.

Current logic pushes all policy rows ahead of non-policy rows (because 0 is treated as “non-policy”). If you want to keep overall alphabetical ordering while still sorting policies numerically, consider only applying numeric comparison when both IDs are policy-numbered.
♻️ Optional adjustment to keep non-policy alphabetical order
 export function sortReportIds(a: string, b: string): number {
   const [aNum, aLang, aBase] = extractSortKey(a);
   const [bNum, bLang, bBase] = extractSortKey(b);

-  // First sort by policy number (0 means no policy number, goes last among policies)
-  if (aNum !== bNum) {
-    if (aNum === 0) {
-      return 1;
-    }
-    if (bNum === 0) {
-      return -1;
-    }
-    return aNum - bNum;
-  }
+  // Only apply numeric ordering when both are policy-numbered
+  if (aNum > 0 && bNum > 0 && aNum !== bNum) {
+    return aNum - bNum;
+  }

coderabbitai · 2026-01-21T03:41:38Z

test/redteam/index.test.ts

+    it('should sort report rows numerically by policy number, not alphabetically', async () => {
+      const mockPluginAction = vi.fn().mockResolvedValue([{ vars: { query: 'test' } }]);
+      vi.spyOn(Plugins, 'find').mockReturnValue({
+        action: mockPluginAction,
+        key: 'policy',
+      });
+
+      // Create 12 policy plugins to test numeric sorting (1, 2, 3... vs 1, 10, 11, 12, 2...)
+      const plugins = Array.from({ length: 12 }, (_, i) => ({
+        id: 'policy',
+        numTests: 1,
+        config: { policy: `Policy ${i + 1}` },
+      }));
+
+      await synthesize({
+        numTests: 1,
+        plugins,
+        prompts: ['Test prompt'],
+        strategies: [],
+        targetIds: ['test-provider'],
+      });
+
+      const reportMessage = vi
+        .mocked(logger.info)
+        .mock.calls.map(([arg]) => arg)
+        .find(
+          (arg): arg is string => typeof arg === 'string' && arg.includes('Test Generation Report'),
+        );
+
+      expect(reportMessage).toBeDefined();
+      const cleanReport = stripAnsi(reportMessage || '');
+
+      // Extract the order of policy numbers from the report
+      const policyMatches = cleanReport.match(/policy #(\d+)/g) || [];
+      const policyNumbers = policyMatches.map((m) => parseInt(m.replace('policy #', ''), 10));
+
+      // Should be sorted numerically: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
+      // NOT alphabetically: 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9
+      expect(policyNumbers).toEqual([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]);
+    });


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check the overall structure of the test file and search for existing afterEach calls rg -n "afterEach|describe\(" test/redteam/index.test.ts | head -20

Repository: promptfoo/promptfoo

Length of output: 1020

🏁 Script executed:

# Also check specifically around the lines mentioned in the review (3068-3107) and nearby describe blocks sed -n '3000,3150p' test/redteam/index.test.ts | grep -n "describe\|afterEach\|it(" | head -30

Repository: promptfoo/promptfoo

Length of output: 256

🏁 Script executed:

# Check if the top-level describe('synthesize') has an afterEach cleanup # First, find where the top-level 'synthesize' describe starts and where nested describes are sed -n '58,100p' test/redteam/index.test.ts

Repository: promptfoo/promptfoo

Length of output: 1383

🏁 Script executed:

# Check for afterEach within the synthesize describe block (which spans many lines) # Let's search for afterEach specifically in the context of the synthesize describe rg -B2 -A2 "afterEach" test/redteam/index.test.ts | head -40

Repository: promptfoo/promptfoo

Length of output: 374

🏁 Script executed:

# Let's see the full scope and location of the existing afterEach # Find the afterEach and check its nesting level and which describe block it belongs to rg -n "afterEach" test/redteam/index.test.ts -B5 -A5

Repository: promptfoo/promptfoo

Length of output: 937

🏁 Script executed:

# Also check the line numbers around the test mentioned (3068-3107) and see which describe it's in sed -n '3050,3120p' test/redteam/index.test.ts

Repository: promptfoo/promptfoo

Length of output: 2451

🏁 Script executed:

# Let's confirm which describe block contains the test at lines 3068-3107 # by checking what's nested at that location sed -n '2460,2500p' test/redteam/index.test.ts | grep -E "describe|^\s*\};"

Repository: promptfoo/promptfoo

Length of output: 113

🏁 Script executed:

# Count the indentation to understand nesting around line 3068 sed -n '3040,3110p' test/redteam/index.test.ts | head -70

Repository: promptfoo/promptfoo

Length of output: 2775

🏁 Script executed:

# Find the exact line numbers and content of afterEach in the top-level synthesize describe sed -n '58,130p' test/redteam/index.test.ts | grep -A2 -B2 "afterEach"

Repository: promptfoo/promptfoo

Length of output: 45

🏁 Script executed:

# Verify the overall nesting - check where describe('synthesize' ends # by looking for the closing brace at the appropriate indentation level tail -50 test/redteam/index.test.ts

Repository: promptfoo/promptfoo

Length of output: 2007

🏁 Script executed:

# Check the entire synthesize describe block structure more carefully # Look for afterEach between line 58 and where nested describes start awk '/^describe$\x27synthesize\x27/,/^describe\(|^})$/ {print NR": "$0}' test/redteam/index.test.ts | head -100

Repository: promptfoo/promptfoo

Length of output: 97

🏁 Script executed:

# Simpler approach: get line numbers for all describe blocks and afterEach in the file grep -n "^describe\|^ afterEach\|^ beforeEach\|^ afterAll" test/redteam/index.test.ts | head -30

Repository: promptfoo/promptfoo

Length of output: 395

🏁 Script executed:

# Let's verify the exact structure: look at what happens after line 69 to see if there's an afterEach in the top-level synthesize sed -n '58,120p' test/redteam/index.test.ts

Repository: promptfoo/promptfoo

Length of output: 2062

🏁 Script executed:

# Continue from line 120 to find if there's an afterEach in the top-level synthesize describe sed -n '110,200p' test/redteam/index.test.ts

Repository: promptfoo/promptfoo

Length of output: 2898

🏁 Script executed:

# Definitive check: grep for afterEach in the entire file and show context grep -n "afterEach" test/redteam/index.test.ts

Repository: promptfoo/promptfoo

Length of output: 173

🏁 Script executed:

# Also check the structure right after the beforeEach ends to confirm no afterEach at top level sed -n '100,130p' test/redteam/index.test.ts | cat -n

Repository: promptfoo/promptfoo

Length of output: 1355

Add required afterEach mock cleanup to top-level describe('synthesize').

The test file lacks the mandated afterEach(() => vi.resetAllMocks()) at the top-level describe scope. Per coding guidelines, all tests must clean up mocks after each test.

Suggested fix

describe('synthesize', () => { const mockProvider = { callApi: vi.fn(), generate: vi.fn(), id: () => 'test-provider', }; afterAll(() => { vi.restoreAllMocks(); }); beforeEach(() => { vi.clearAllMocks(); vi.resetAllMocks(); // ... rest of setup }); + afterEach(() => { + vi.resetAllMocks(); + });

🤖 Prompt for AI Agents

In `@test/redteam/index.test.ts` around lines 3068 - 3107, Add a top-level afterEach cleanup inside the describe('synthesize') block to reset mocks after each test: locate the describe('synthesize') containing the it(...) tests (including the 'should sort report rows...' test) and insert afterEach(() => vi.resetAllMocks()) so all vi mocks/spies are cleared between tests (ensuring Plugins.find, logger.info, and other mocks are reset).

coderabbitai · 2026-01-21T03:41:38Z

test/redteam/report.test.ts

+import { describe, expect, it } from 'vitest';
+import { Severity } from '../../src/redteam/constants';
+import {
+  extractSortKey,
+  generateReport,
+  getPluginBaseDisplayId,
+  getPluginSeverity,
+  getStatus,
+  sortReportIds,
+} from '../../src/redteam/report';
+
+// Strip ANSI codes for easier testing
+const stripAnsi = (str: string) => str.replace(/\x1B\[[0-9;]*[a-zA-Z]/g, '');


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check vitest configuration find . -name "vitest.config.*" -o -name "vitest.setup.*" | head -20

Repository: promptfoo/promptfoo

Length of output: 168

🏁 Script executed:

# Review existing test files in test/ directory to see current patterns fd -e test.ts$ test/ | head -10 | xargs -I {} head -15 {}

Repository: promptfoo/promptfoo

Length of output: 388

🏁 Script executed:

# Check for any vitest globals configuration in package.json or tsconfig fd -e json . -path "*/package.json" -o -path "*/vitest*" | head -10 | xargs -I {} grep -l "globals\|vitest" {}

Repository: promptfoo/promptfoo

Length of output: 231

🏁 Script executed:

# Read main vitest config cat vitest.config.ts

Repository: promptfoo/promptfoo

Length of output: 1763

🏁 Script executed:

# Read vitest setup cat vitest.setup.ts

Repository: promptfoo/promptfoo

Length of output: 1796

🏁 Script executed:

# Check existing test files in test/ directory for import patterns fd . test/ --extension test.ts | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -20 {}'

Repository: promptfoo/promptfoo

Length of output: 3666

🏁 Script executed:

# Read the full report.test.ts file to see if vi or mocks are used wc -l test/redteam/report.test.ts && cat test/redteam/report.test.ts

Repository: promptfoo/promptfoo

Length of output: 9119

Add explicit vitest imports and cleanup as required by test guidelines.

Tests must import afterEach and vi from vitest and run vi.resetAllMocks() in afterEach to maintain consistency across the test suite, even when mocks aren't currently used.

Suggested fix

-import { describe, expect, it } from 'vitest'; +import { afterEach, describe, expect, it, vi } from 'vitest'; import { Severity } from '../../src/redteam/constants'; import { extractSortKey, generateReport, getPluginBaseDisplayId, getPluginSeverity, getStatus, sortReportIds, } from '../../src/redteam/report'; // Strip ANSI codes for easier testing const stripAnsi = (str: string) => str.replace(/\x1B\[[0-9;]*[a-zA-Z]/g, ''); + +afterEach(() => { + vi.resetAllMocks(); +});

🤖 Prompt for AI Agents

In `@test/redteam/report.test.ts` around lines 1 - 13, Add explicit vitest imports and a teardown to reset mocks: update the import line that currently pulls describe, expect, it from 'vitest' to also import afterEach and vi, then add a top-level afterEach(() => { vi.resetAllMocks(); }) to the test file so that mocks are reset between tests; this change affects the test helpers (stripAnsi) and tests that call functions like extractSortKey, generateReport, getPluginBaseDisplayId, getPluginSeverity, getStatus, and sortReportIds.

Address CodeRabbit review feedback: - Add afterEach(() => vi.resetAllMocks()) to report.test.ts - Add afterEach mock cleanup to synthesize describe block in index.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mldangelo and others added 2 commits January 20, 2026 22:31

mldangelo requested review from MrFlounder and will-holley as code owners January 21, 2026 03:34

promptfoo-scanner bot reviewed Jan 21, 2026

View reviewed changes

coderabbitai bot reviewed Jan 21, 2026

View reviewed changes

test(redteam): add afterEach mock cleanup per code review

d2e0323

Address CodeRabbit review feedback: - Add afterEach(() => vi.resetAllMocks()) to report.test.ts - Add afterEach mock cleanup to synthesize describe block in index.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(redteam): extract report functions and add numeric sorting #7162

refactor(redteam): extract report functions and add numeric sorting #7162

mldangelo commented Jan 21, 2026

Uh oh!

use-tusk bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

promptfoo-scanner bot left a comment

Uh oh!

coderabbitai bot commented Jan 21, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 21, 2026

Uh oh!

coderabbitai bot Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

refactor(redteam): extract report functions and add numeric sorting #7162

Are you sure you want to change the base?

refactor(redteam): extract report functions and add numeric sorting #7162

Conversation

mldangelo commented Jan 21, 2026

Summary

Before (alphabetical sorting)

After (numeric sorting)

Test plan

Uh oh!

use-tusk bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔄 Security Review ✅

PR Convention Check

Uh oh!

promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 21, 2026

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

use-tusk bot commented Jan 21, 2026 •

edited

Loading

github-actions bot commented Jan 21, 2026 •

edited

Loading