Skip to content

Conversation

@Aditya-Ghatole
Copy link
Contributor

@Aditya-Ghatole Aditya-Ghatole commented Jan 14, 2026

@ziadhany

Summary

Add a curated vulnerability dataset and baseline evaluation results for review.

Details

This PR includes:

A vulnerability dataset containing summaries with expected severity and CWE labels.

Baseline evaluation results produced by running the current prompt against this dataset.

Purpose

The intent is to review and validate the dataset first, specifically to confirm that it contains sufficient and appropriate information to support CWE prediction, before making further changes to prompts or evaluation logic.

Related issue: #3

Signed-off-by: Aditya G <aditya.ghatole05@gmail.com>
@ziadhany
Copy link
Collaborator

@Aditya-Ghatole
Please share the code, not just the JSON, so I can generate outputs and test with multiple prompts. Also, what models and prompts are you using?

Signed-off-by: Aditya G <aditya.ghatole05@gmail.com>
@Aditya-Ghatole
Copy link
Contributor Author

@ziadhany
I’ve made a few changes to the file structure to better align with the issue’s requirements (specifically, separating prompts and tests into their own directories). Because of this, a bit of path handling will be needed to get things running as-is, or I can push the structure changes if that works better for you.

I’m currently using the openai/gpt-oss-120b model with a temperature of 0.0, along with the existing prompt (prompt_v1) and an experimental prompt I’m working on.

@Aditya-Ghatole
Copy link
Contributor Author

I should also mention that I’m aware there are a few issues in the current code. I’m working on a cleaner solution and will update you soon. For now, I’ve shared a beta version that is functional and should be sufficient to move things forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants