feat(eval): xscode generation #10

nirav0999 · 2025-08-07T22:46:56Z

added xscode generation

gemini-code-assist

Summary of Changes

Hello @nirav0999, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've introduced a new feature for generating and evaluating XSCode, a benchmark for secure code generation. This involves a multi-step pipeline to create high-quality, de-duplicated, and filtered prompts based on Common Weakness Enumerations (CWEs) and CodeGuru rules, ensuring they are suitable for testing overrefusal in CodeLLMs.

Highlights

XSCode Generation Pipeline: I've implemented a comprehensive pipeline to generate overrefusal prompts for secure code generation, leveraging Common Weakness Enumeration (CWE) and CodeGuru vulnerability information.
Automated Prompt Filtering: I've introduced pre- and post-filtering mechanisms to validate generated prompts against criteria such as asking for code, being benign, and having a secure solution, ensuring the quality of the dataset.
Data Deduplication: I've integrated a deduplication step using MinHash and LSH to remove redundant or highly similar prompts, enhancing the diversity and efficiency of the evaluation dataset.
Modular Architecture: I've structured the generation process into distinct, manageable Python modules (cwe2ovrf.py, dedup.py, pre_filter.py, post_filter.py) orchestrated by a central main.py script.
Project Restructuring: I've renamed and reorganized existing annotation utility scripts into a dedicated annotate_utils subdirectory, along with updating copyright information.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull Request Overview

This PR introduces XSCode, a new evaluation benchmark for detecting overrefusal in secure code generation. The system generates evaluation prompts by creating benign code requests that may trigger unnecessary refusals from safety-aligned models.

Implements a complete pipeline for generating, filtering, and validating prompts based on CWE vulnerabilities
Adds pre and post-filtering mechanisms using LLM judges to ensure prompt quality
Provides deduplication functionality to remove similar prompts from the dataset

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
eval/compile_xscode/main.py	Main orchestration script coordinating the entire XSCode generation pipeline
eval/compile_xscode/cwe2ovrf.py	Core generation logic that transforms CWE vulnerabilities into overrefusal test prompts
eval/compile_xscode/pre_filter.py	Pre-filtering validation using LLM judges to assess prompt quality before annotation
eval/compile_xscode/post_filter.py	Post-filtering validation applied after manual annotation to ensure final quality
eval/compile_xscode/dedup.py	Deduplication system using MinHash LSH to remove similar prompts
eval/compile_xscode/annotate_utils/*.py	Annotation utilities with minor copyright header updates and typo fixes
eval/compile_xscode/README.md	Documentation explaining XSCode usage and evaluation

eval/compile_xscode/annotate_utils/gather.py

eval/compile_xscode/annotate_utils/annotate.py

eval/compile_xscode/cwe2ovrf.py

eval/compile_xscode/post_filter.py

gemini-code-assist

Code Review

This pull request introduces a comprehensive pipeline for generating the XSCode dataset, an overrefusal benchmark for secure code generation. The changes include scripts for data generation from CWE and CodeGuru sources, deduplication using MinHashLSH, and pre/post-filtering using LLM-based judges. The overall implementation is well-structured and robust. I've identified a few minor issues, including some typos and an opportunity to improve the robustness of a file operation. Overall, this is a great addition.

eval/compile_xscode/cwe2ovrf.py

eval/compile_xscode/annotate_utils/annotate.py

eval/compile_xscode/annotate_utils/gather.py

Copilot AI review requested due to automatic review settings August 7, 2025 22:46

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

Copilot AI reviewed Aug 7, 2025

View reviewed changes

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

nirav0999 closed this Aug 7, 2025

nirav0999 force-pushed the xscode-generation branch from 4f7ccaf to 8f4929c Compare August 7, 2025 22:51

feat(eval): xscode generation

968c0a3

nirav0999 reopened this Aug 7, 2025

nirav0999 added 4 commits August 7, 2025 23:06

fix: typo

ba280fa

fix: typo

3e93ab1

fix: typos in annotate_utils

31e61f1

fix: remove old header

74f2e8b

ganler approved these changes Aug 7, 2025

View reviewed changes

ganler merged commit 9c2fa31 into main Aug 7, 2025
2 checks passed

ganler deleted the xscode-generation branch August 7, 2025 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): xscode generation #10

feat(eval): xscode generation #10

Uh oh!

nirav0999 commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(eval): xscode generation #10

feat(eval): xscode generation #10

Uh oh!

Conversation

nirav0999 commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants