You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+74-18Lines changed: 74 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,42 @@
1
1
# PySymBench
2
-
Infrastructure for **model comparison and evaluation in symbolic execution workflows**.
3
2
4
-
This project is a **local web application** designed to compare symbolic execution results of an uploaded trained model (in `.onnx` format) on a selected dataset with a **baseline symbolic execution approach (non-AI)**.
3
+
Infrastructure for **AI model comparison and evaluation in symbolic execution workflows**.
5
4
6
-
The system uses **PySymGym tools** to run symbolic execution on the dataset and evaluate the results. After execution completes, the results are sent to the **email address you provide**.
5
+
PySymBench is a **local web application** for evaluating ONNX models against a non-AI baseline symbolic execution strategy. Experiments run inside Docker using [PySymGym](https://github.com/PySymGym/PySymGym) tools on a fixed dataset; results are emailed back to the user and (when published) saved to a leaderboard.
6
+
7
+
Three target languages are supported for the dataset: **C#**, **Java**, and **C++**.
7
8
8
9
## Features
9
10
10
-
-**Run Experiment** — upload an ONNX model, select test methods from the dataset, and compare it against the baseline strategy. Results (coverage, errors, timing) are delivered to your inbox.
11
-
-**Model Ranking** — a public leaderboard of all published experiments, sorted by mean coverage. Shows per-experiment metrics: mean/median coverage, total tests, errors, and runtime.
12
-
-**Publish Experiment** — submit a model to the ranking leaderboard. The experiment runs in Docker, computes metrics, and saves the result to the database. Supports cancellation while in progress.
11
+
-**Run Experiment** — upload an ONNX model, choose a target language, select methods from the dataset, and compare the model against the baseline strategy. Coverage, errors and timing are emailed to you. Each running task can be cancelled via a one-click link in the confirmation email.
12
+
-**Model Ranking** — a leaderboard of all completed experiments per language (with an aggregated view across languages), sorted by mean coverage. Per-experiment metrics include mean/median coverage, total tests, errors, runtime, and coverage percentage.
13
+
-**Pairwise Comparison** — pick any two experiments from the ranking and produce side-by-side comparison artifacts (PDFs) downloadable individually or as a single zip.
14
+
-**Model Interface docs** — page that describes the ONNX input/output specification required to plug a model into PySymGym.
15
+
16
+
### Routes
13
17
14
18
The frontend is a multi-page React SPA using `react-router-dom`:
15
19
16
20
| Route | Page |
17
21
|---|---|
18
22
|`/`| Home — navigation hub |
19
23
|`/experiment`| Run Experiment form |
20
-
|`/ranking`| Model Ranking leaderboard |
21
-
|`/ranking/publish`| Publish Experiment form |
24
+
|`/ranking`| Model Ranking leaderboard + pairwise comparison |
25
+
|`/interface`| Model Interface specification |
26
+
27
+
### Backend API
28
+
29
+
| Method | Path | Purpose |
30
+
|---|---|---|
31
+
|`POST`|`/api/upload`| Submit a new experiment (multipart: ONNX file, `email`, `language`, `experiment`) |
32
+
|`GET`|`/api/status/{task_uid}`| Celery task state |
33
+
|`POST`|`/api/cancel/{task_uid}`| Cancel a running experiment |
34
+
|`GET`|`/api/cancel/{task_uid}?token=...`| One-click cancellation link sent by email |
When publishing experiments to the ranking, the ONNX model and result artifacts can be stored in MinIO. Add the following to your `.env` file:
80
+
Experiments store their ONNX model and result artifacts in MinIO; the pairwise comparison feature also reads artifacts from there. MinIO must be reachable — if it is not configured or unavailable, the task fails and the user is notified by email. Add the following to your `.env` file:
63
81
64
82
```
65
83
MINIO_ENDPOINT=localhost:9000
@@ -69,7 +87,7 @@ MINIO_SECURE=false
69
87
MINIO_BUCKET=pysymbench
70
88
```
71
89
72
-
If not configured, artifact upload is skipped and only metrics are saved to the database. You can run a local MinIO instance via Docker:
90
+
You can run a local MinIO instance via Docker:
73
91
74
92
```
75
93
docker run --name minio -p 9000:9000 -p 9001:9001 \
@@ -91,6 +109,17 @@ All services that connect to Redis — the FastAPI app and every Celery worker
91
109
92
110
---
93
111
112
+
## URLs for email links
113
+
114
+
Cancellation links sent by email are absolute, so the backend needs to know its own public URL and the URL of the frontend. Defaults match a local setup; override them in `.env` if the app is reachable elsewhere:
115
+
116
+
```
117
+
BASE_URL=http://localhost:8000 # base URL of the FastAPI app
118
+
FRONTEND_URL=http://localhost:5173 # base URL of the React frontend
119
+
```
120
+
121
+
---
122
+
94
123
## Backend Setup
95
124
96
125
1. Install **Python 3.14** and **Docker**, then install the project dependencies:
0 commit comments