-
-
Notifications
You must be signed in to change notification settings - Fork 884
refactor(redteam): extract report functions and add numeric sorting #7162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add extractSortKey() to parse language prefix/suffix and policy number - Add sortReportIds() for numeric policy sorting (1,2,3... not 1,10,11,12,2...) - Sort by: policy number (numeric) → base ID → language - Add test verifying numeric sort order with 12 policies Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move report generation functions (getStatus, extractSortKey, sortReportIds, generateReport) to new src/redteam/report.ts - Move plugin display functions (getPluginSeverity, getPluginBaseDisplayId) for better separation of concerns - Add comprehensive unit tests for all extracted functions - Improves testability and code organization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🔄 Security Review ✅No issues in new changes. The incremental changes add PR Convention Check✅ PR touches Last updated: 2026-01-21T03:50:00Z | Reviewing: d2e0323 (incremental from ece40e6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 All Clear
This PR refactors report generation functions into a dedicated module and adds numeric sorting for policy numbers. No LLM security vulnerabilities were identified in these changes.
Minimum severity threshold for this scan: 🟡 Medium | Learn more
📝 WalkthroughWalkthroughThis pull request refactors report-generation utilities by extracting internal helper functions from src/redteam/index.ts into a new src/redteam/report.ts module. The extracted functions include getPluginSeverity, getPluginBaseDisplayId, getStatus, and generateReport. New sorting utilities (extractSortKey and sortReportIds) are introduced to enable deterministic report ordering, particularly for numeric policy number sorting. The index.ts file is updated to import these utilities from the new report module. Test coverage is added to verify the report generation, sorting, and status formatting logic. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@test/redteam/index.test.ts`:
- Around line 3068-3107: Add a top-level afterEach cleanup inside the
describe('synthesize') block to reset mocks after each test: locate the
describe('synthesize') containing the it(...) tests (including the 'should sort
report rows...' test) and insert afterEach(() => vi.resetAllMocks()) so all vi
mocks/spies are cleared between tests (ensuring Plugins.find, logger.info, and
other mocks are reset).
In `@test/redteam/report.test.ts`:
- Around line 1-13: Add explicit vitest imports and a teardown to reset mocks:
update the import line that currently pulls describe, expect, it from 'vitest'
to also import afterEach and vi, then add a top-level afterEach(() => {
vi.resetAllMocks(); }) to the test file so that mocks are reset between tests;
this change affects the test helpers (stripAnsi) and tests that call functions
like extractSortKey, generateReport, getPluginBaseDisplayId, getPluginSeverity,
getStatus, and sortReportIds.
🧹 Nitpick comments (1)
src/redteam/report.ts (1)
100-123: Verify ordering between policy and non‑policy IDs.Current logic pushes all policy rows ahead of non-policy rows (because
0is treated as “non-policy”). If you want to keep overall alphabetical ordering while still sorting policies numerically, consider only applying numeric comparison when both IDs are policy-numbered.♻️ Optional adjustment to keep non-policy alphabetical order
export function sortReportIds(a: string, b: string): number { const [aNum, aLang, aBase] = extractSortKey(a); const [bNum, bLang, bBase] = extractSortKey(b); - // First sort by policy number (0 means no policy number, goes last among policies) - if (aNum !== bNum) { - if (aNum === 0) { - return 1; - } - if (bNum === 0) { - return -1; - } - return aNum - bNum; - } + // Only apply numeric ordering when both are policy-numbered + if (aNum > 0 && bNum > 0 && aNum !== bNum) { + return aNum - bNum; + }
| it('should sort report rows numerically by policy number, not alphabetically', async () => { | ||
| const mockPluginAction = vi.fn().mockResolvedValue([{ vars: { query: 'test' } }]); | ||
| vi.spyOn(Plugins, 'find').mockReturnValue({ | ||
| action: mockPluginAction, | ||
| key: 'policy', | ||
| }); | ||
|
|
||
| // Create 12 policy plugins to test numeric sorting (1, 2, 3... vs 1, 10, 11, 12, 2...) | ||
| const plugins = Array.from({ length: 12 }, (_, i) => ({ | ||
| id: 'policy', | ||
| numTests: 1, | ||
| config: { policy: `Policy ${i + 1}` }, | ||
| })); | ||
|
|
||
| await synthesize({ | ||
| numTests: 1, | ||
| plugins, | ||
| prompts: ['Test prompt'], | ||
| strategies: [], | ||
| targetIds: ['test-provider'], | ||
| }); | ||
|
|
||
| const reportMessage = vi | ||
| .mocked(logger.info) | ||
| .mock.calls.map(([arg]) => arg) | ||
| .find( | ||
| (arg): arg is string => typeof arg === 'string' && arg.includes('Test Generation Report'), | ||
| ); | ||
|
|
||
| expect(reportMessage).toBeDefined(); | ||
| const cleanReport = stripAnsi(reportMessage || ''); | ||
|
|
||
| // Extract the order of policy numbers from the report | ||
| const policyMatches = cleanReport.match(/policy #(\d+)/g) || []; | ||
| const policyNumbers = policyMatches.map((m) => parseInt(m.replace('policy #', ''), 10)); | ||
|
|
||
| // Should be sorted numerically: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 | ||
| // NOT alphabetically: 1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9 | ||
| expect(policyNumbers).toEqual([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check the overall structure of the test file and search for existing afterEach calls
rg -n "afterEach|describe\(" test/redteam/index.test.ts | head -20Repository: promptfoo/promptfoo
Length of output: 1020
🏁 Script executed:
# Also check specifically around the lines mentioned in the review (3068-3107) and nearby describe blocks
sed -n '3000,3150p' test/redteam/index.test.ts | grep -n "describe\|afterEach\|it(" | head -30Repository: promptfoo/promptfoo
Length of output: 256
🏁 Script executed:
# Check if the top-level describe('synthesize') has an afterEach cleanup
# First, find where the top-level 'synthesize' describe starts and where nested describes are
sed -n '58,100p' test/redteam/index.test.tsRepository: promptfoo/promptfoo
Length of output: 1383
🏁 Script executed:
# Check for afterEach within the synthesize describe block (which spans many lines)
# Let's search for afterEach specifically in the context of the synthesize describe
rg -B2 -A2 "afterEach" test/redteam/index.test.ts | head -40Repository: promptfoo/promptfoo
Length of output: 374
🏁 Script executed:
# Let's see the full scope and location of the existing afterEach
# Find the afterEach and check its nesting level and which describe block it belongs to
rg -n "afterEach" test/redteam/index.test.ts -B5 -A5Repository: promptfoo/promptfoo
Length of output: 937
🏁 Script executed:
# Also check the line numbers around the test mentioned (3068-3107) and see which describe it's in
sed -n '3050,3120p' test/redteam/index.test.tsRepository: promptfoo/promptfoo
Length of output: 2451
🏁 Script executed:
# Let's confirm which describe block contains the test at lines 3068-3107
# by checking what's nested at that location
sed -n '2460,2500p' test/redteam/index.test.ts | grep -E "describe|^\s*\};"Repository: promptfoo/promptfoo
Length of output: 113
🏁 Script executed:
# Count the indentation to understand nesting around line 3068
sed -n '3040,3110p' test/redteam/index.test.ts | head -70Repository: promptfoo/promptfoo
Length of output: 2775
🏁 Script executed:
# Find the exact line numbers and content of afterEach in the top-level synthesize describe
sed -n '58,130p' test/redteam/index.test.ts | grep -A2 -B2 "afterEach"Repository: promptfoo/promptfoo
Length of output: 45
🏁 Script executed:
# Verify the overall nesting - check where describe('synthesize' ends
# by looking for the closing brace at the appropriate indentation level
tail -50 test/redteam/index.test.tsRepository: promptfoo/promptfoo
Length of output: 2007
🏁 Script executed:
# Check the entire synthesize describe block structure more carefully
# Look for afterEach between line 58 and where nested describes start
awk '/^describe\(\x27synthesize\x27/,/^describe\(|^})\)/ {print NR": "$0}' test/redteam/index.test.ts | head -100Repository: promptfoo/promptfoo
Length of output: 97
🏁 Script executed:
# Simpler approach: get line numbers for all describe blocks and afterEach in the file
grep -n "^describe\|^ afterEach\|^ beforeEach\|^ afterAll" test/redteam/index.test.ts | head -30Repository: promptfoo/promptfoo
Length of output: 395
🏁 Script executed:
# Let's verify the exact structure: look at what happens after line 69 to see if there's an afterEach in the top-level synthesize
sed -n '58,120p' test/redteam/index.test.tsRepository: promptfoo/promptfoo
Length of output: 2062
🏁 Script executed:
# Continue from line 120 to find if there's an afterEach in the top-level synthesize describe
sed -n '110,200p' test/redteam/index.test.tsRepository: promptfoo/promptfoo
Length of output: 2898
🏁 Script executed:
# Definitive check: grep for afterEach in the entire file and show context
grep -n "afterEach" test/redteam/index.test.tsRepository: promptfoo/promptfoo
Length of output: 173
🏁 Script executed:
# Also check the structure right after the beforeEach ends to confirm no afterEach at top level
sed -n '100,130p' test/redteam/index.test.ts | cat -nRepository: promptfoo/promptfoo
Length of output: 1355
Add required afterEach mock cleanup to top-level describe('synthesize').
The test file lacks the mandated afterEach(() => vi.resetAllMocks()) at the top-level describe scope. Per coding guidelines, all tests must clean up mocks after each test.
Suggested fix
describe('synthesize', () => {
const mockProvider = {
callApi: vi.fn(),
generate: vi.fn(),
id: () => 'test-provider',
};
afterAll(() => {
vi.restoreAllMocks();
});
beforeEach(() => {
vi.clearAllMocks();
vi.resetAllMocks();
// ... rest of setup
});
+ afterEach(() => {
+ vi.resetAllMocks();
+ });🤖 Prompt for AI Agents
In `@test/redteam/index.test.ts` around lines 3068 - 3107, Add a top-level
afterEach cleanup inside the describe('synthesize') block to reset mocks after
each test: locate the describe('synthesize') containing the it(...) tests
(including the 'should sort report rows...' test) and insert afterEach(() =>
vi.resetAllMocks()) so all vi mocks/spies are cleared between tests (ensuring
Plugins.find, logger.info, and other mocks are reset).
test/redteam/report.test.ts
Outdated
| import { describe, expect, it } from 'vitest'; | ||
| import { Severity } from '../../src/redteam/constants'; | ||
| import { | ||
| extractSortKey, | ||
| generateReport, | ||
| getPluginBaseDisplayId, | ||
| getPluginSeverity, | ||
| getStatus, | ||
| sortReportIds, | ||
| } from '../../src/redteam/report'; | ||
|
|
||
| // Strip ANSI codes for easier testing | ||
| const stripAnsi = (str: string) => str.replace(/\x1B\[[0-9;]*[a-zA-Z]/g, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check vitest configuration
find . -name "vitest.config.*" -o -name "vitest.setup.*" | head -20Repository: promptfoo/promptfoo
Length of output: 168
🏁 Script executed:
# Review existing test files in test/ directory to see current patterns
fd -e test.ts$ test/ | head -10 | xargs -I {} head -15 {}Repository: promptfoo/promptfoo
Length of output: 388
🏁 Script executed:
# Check for any vitest globals configuration in package.json or tsconfig
fd -e json . -path "*/package.json" -o -path "*/vitest*" | head -10 | xargs -I {} grep -l "globals\|vitest" {}Repository: promptfoo/promptfoo
Length of output: 231
🏁 Script executed:
# Read main vitest config
cat vitest.config.tsRepository: promptfoo/promptfoo
Length of output: 1763
🏁 Script executed:
# Read vitest setup
cat vitest.setup.tsRepository: promptfoo/promptfoo
Length of output: 1796
🏁 Script executed:
# Check existing test files in test/ directory for import patterns
fd . test/ --extension test.ts | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -20 {}'Repository: promptfoo/promptfoo
Length of output: 3666
🏁 Script executed:
# Read the full report.test.ts file to see if vi or mocks are used
wc -l test/redteam/report.test.ts && cat test/redteam/report.test.tsRepository: promptfoo/promptfoo
Length of output: 9119
Add explicit vitest imports and cleanup as required by test guidelines.
Tests must import afterEach and vi from vitest and run vi.resetAllMocks() in afterEach to maintain consistency across the test suite, even when mocks aren't currently used.
Suggested fix
-import { describe, expect, it } from 'vitest';
+import { afterEach, describe, expect, it, vi } from 'vitest';
import { Severity } from '../../src/redteam/constants';
import {
extractSortKey,
generateReport,
getPluginBaseDisplayId,
getPluginSeverity,
getStatus,
sortReportIds,
} from '../../src/redteam/report';
// Strip ANSI codes for easier testing
const stripAnsi = (str: string) => str.replace(/\x1B\[[0-9;]*[a-zA-Z]/g, '');
+
+afterEach(() => {
+ vi.resetAllMocks();
+});🤖 Prompt for AI Agents
In `@test/redteam/report.test.ts` around lines 1 - 13, Add explicit vitest imports
and a teardown to reset mocks: update the import line that currently pulls
describe, expect, it from 'vitest' to also import afterEach and vi, then add a
top-level afterEach(() => { vi.resetAllMocks(); }) to the test file so that
mocks are reset between tests; this change affects the test helpers (stripAnsi)
and tests that call functions like extractSortKey, generateReport,
getPluginBaseDisplayId, getPluginSeverity, getStatus, and sortReportIds.
Address CodeRabbit review feedback: - Add afterEach(() => vi.resetAllMocks()) to report.test.ts - Add afterEach mock cleanup to synthesize describe block in index.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Follow-up improvements to PR #7157 (policy plugin display IDs):
generateReport,getStatus,getPluginSeverity,getPluginBaseDisplayId,extractSortKey,sortReportIdstosrc/redteam/report.tsfor better separation of concernsBefore (alphabetical sorting)
After (numeric sorting)
Test plan
🤖 Generated with Claude Code