Add dataset and baseline A/B results #13

Aditya-Ghatole · 2026-01-14T15:13:40Z

@ziadhany

Summary

Add a curated vulnerability dataset and baseline evaluation results for review.

Details

This PR includes:

A vulnerability dataset containing summaries with expected severity and CWE labels.

Baseline evaluation results produced by running the current prompt against this dataset.

Purpose

The intent is to review and validate the dataset first, specifically to confirm that it contains sufficient and appropriate information to support CWE prediction, before making further changes to prompts or evaluation logic.

Related issue: #3

Signed-off-by: Aditya G <aditya.ghatole05@gmail.com>

ziadhany · 2026-01-15T14:12:20Z

@Aditya-Ghatole
Please share the code, not just the JSON, so I can generate outputs and test with multiple prompts. Also, what models and prompts are you using?

Signed-off-by: Aditya G <aditya.ghatole05@gmail.com>

Aditya-Ghatole · 2026-01-15T15:43:59Z

@ziadhany
I’ve made a few changes to the file structure to better align with the issue’s requirements (specifically, separating prompts and tests into their own directories). Because of this, a bit of path handling will be needed to get things running as-is, or I can push the structure changes if that works better for you.

I’m currently using the openai/gpt-oss-120b model with a temperature of 0.0, along with the existing prompt (prompt_v1) and an experimental prompt I’m working on.

Aditya-Ghatole · 2026-01-15T15:46:01Z

I should also mention that I’m aware there are a few issues in the current code. I’m working on a cleaner solution and will update you soon. For now, I’ve shared a beta version that is functional and should be sufficient to move things forward.

Add dataset and baseline A/B results

1c917e0

Signed-off-by: Aditya G <aditya.ghatole05@gmail.com>

Add A/B test runner and dataset loader

73bfb0b

Signed-off-by: Aditya G <aditya.ghatole05@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset and baseline A/B results #13

Add dataset and baseline A/B results #13

Aditya-Ghatole commented Jan 14, 2026 •

edited

Loading

Uh oh!

ziadhany commented Jan 15, 2026

Uh oh!

Aditya-Ghatole commented Jan 15, 2026

Uh oh!

Aditya-Ghatole commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add dataset and baseline A/B results #13

Are you sure you want to change the base?

Add dataset and baseline A/B results #13

Conversation

Aditya-Ghatole commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Purpose

Uh oh!

ziadhany commented Jan 15, 2026

Uh oh!

Aditya-Ghatole commented Jan 15, 2026

Uh oh!

Aditya-Ghatole commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aditya-Ghatole commented Jan 14, 2026 •

edited

Loading