Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
fdd1591
Add reproducibility package infrastructure
pjhartout Feb 5, 2026
5752c1f
Add PGD benchmark and MMD table generation scripts
pjhartout Feb 5, 2026
ae0df6d
Add GKLR table generation script
pjhartout Feb 5, 2026
778023f
Add concatenation ablation table generation script
pjhartout Feb 5, 2026
674b791
Add figure generation scripts for reproducibility
pjhartout Feb 5, 2026
5c4edef
Update pixi config to new workspace schema and fix package name
pjhartout Feb 8, 2026
7066bc5
Relocate data directory to data/polygraph_graphs/
pjhartout Feb 8, 2026
c804535
Remove standalone reproducibility README and requirements.txt
pjhartout Feb 8, 2026
5b030ab
Add SLURM cluster submission infrastructure
pjhartout Feb 8, 2026
c6fc855
Add SLURM cluster support to table generation scripts
pjhartout Feb 8, 2026
c906559
Add reproducibility section to README
pjhartout Feb 8, 2026
39c5ea4
Add pre-generated tables and figures for reference
pjhartout Feb 8, 2026
096471d
Fix ruff lint errors in reproducibility scripts
pjhartout Mar 5, 2026
5198af6
Fix ruff formatting in reproducibility scripts
pjhartout Mar 5, 2026
a795fcc
Remove pre-generated tables and figures from tracking
pjhartout Mar 5, 2026
44b1177
refactor: Address code review findings in reproducibility scripts
pjhartout Mar 5, 2026
fdf8a72
fix: Remove unused pytest import from test_reproducibility
pjhartout Mar 5, 2026
cd4a164
chore: Remove plans from version control
pjhartout Mar 6, 2026
f4cb942
feat: Update SLURM partitions and add local launcher
pjhartout Mar 6, 2026
32dc9af
refactor: Address code review findings across reproducibility scripts
pjhartout Mar 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
polygraph/orca
docs/plans/
docs/images
logs/
data/
Expand All @@ -9,6 +10,8 @@ experiments/*/figures/*.png
experiments/*/results/*.csv
experiments/*/tables/*.tex
experiments/model_benchmark/benchmark_results.tex
reproducibility/tables/
reproducibility/figures/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
166 changes: 166 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,172 @@ The following results mirror the tables from [our paper](https://arxiv.org/abs/2

<sub>* AutoGraph* denotes a variant that leverages additional training heuristics as described in the [paper](https://arxiv.org/abs/2510.06122).</sub>

## Reproducibility

The [`reproducibility/`](reproducibility/) directory contains scripts to reproduce all tables and figures from the paper.

### Quick Start

```bash
# 1. Install dependencies
pixi install

# 2. Download the graph data (~3GB)
cd reproducibility
python download_data.py

# 3. Generate all tables and figures
make all
```

### Data Download

The generated graph data (~3GB) is hosted on [Proton Drive](https://drive.proton.me/urls/VM4NWYBQD0#3sqmZtmSgWTB). After downloading, extract to `data/polygraph_graphs/` in the repository root.

```bash
# Full dataset (required for complete reproducibility)
python download_data.py

# Small subset for testing/CI (~50 graphs per model)
python download_data.py --subset
```

Expected data structure after extraction:

```
data/polygraph_graphs/
├── AUTOGRAPH/
│ ├── planar.pkl
│ ├── lobster.pkl
│ ├── sbm.pkl
│ └── proteins.pkl
├── DIGRESS/
│ ├── planar.pkl
│ ├── lobster.pkl
│ ├── sbm.pkl
│ ├── proteins.pkl
│ ├── denoising-iterations/
│ │ └── {15,30,45,60,75,90}_steps.pkl
│ └── training-iterations/
│ └── {119,209,...,3479}_steps.pkl
├── ESGG/
│ └── *.pkl
├── GRAN/
│ └── *.pkl
└── molecule_eval/
└── *.smiles
```

### Scripts Overview

#### Table Generation

| Script | Output | Description |
|--------|--------|-------------|
| `generate_benchmark_tables.py` | `tables/benchmark_results.tex` | Main PGD benchmark (Table 1) comparing AUTOGRAPH, DiGress, GRAN, ESGG |
| `generate_mmd_tables.py` | `tables/mmd_gtv.tex`, `tables/mmd_rbf_biased.tex` | MMD² metrics with GTV and RBF kernels |
| `generate_gklr_tables.py` | `tables/gklr.tex` | PGD with Kernel Logistic Regression using WL and SP kernels |
| `generate_concatenation_tables.py` | `tables/concatenation.tex` | Ablation comparing individual vs concatenated descriptors |

#### Figure Generation

| Script | Output | Description |
|--------|--------|-------------|
| `generate_subsampling_figures.py` | `figures/subsampling/` | Bias-variance tradeoff as function of sample size |
| `generate_perturbation_figures.py` | `figures/perturbation/` | Metric sensitivity to edge perturbations |
| `generate_model_quality_figures.py` | `figures/model_quality/` | PGD vs training/denoising steps for DiGress |
| `generate_phase_plot.py` | `figures/phase_plot/` | Training dynamics showing PGD vs VUN |

Each script can be run independently with `--subset` for quick testing:

```bash
# Tables (full computation)
python generate_benchmark_tables.py
python generate_mmd_tables.py
python generate_gklr_tables.py
python generate_concatenation_tables.py

# Tables (quick testing with --subset)
python generate_benchmark_tables.py --subset
python generate_mmd_tables.py --subset

# Figures (full computation)
python generate_subsampling_figures.py
python generate_perturbation_figures.py
python generate_model_quality_figures.py
python generate_phase_plot.py

# Figures (quick testing)
python generate_subsampling_figures.py --subset
python generate_perturbation_figures.py --subset
```

### Make Targets

```bash
make download # Download full dataset (manual step required)
make download-subset # Create small subset for CI testing
make tables # Generate all LaTeX tables
make figures # Generate all figures
make all # Generate everything
make tables-submit # Submit table jobs to SLURM cluster
make tables-collect # Collect results from completed SLURM jobs
make clean # Remove generated outputs
make help # Show available targets
```

### Hardware Requirements

- **Memory:** 16GB RAM recommended for full dataset
- **Storage:** ~4GB for data + outputs
- **Time:** Full generation takes ~2-4 hours on a modern CPU

The `--subset` flag uses ~50 graphs per model, runs in minutes, and verifies code correctness (results are not publication-quality).

### Cluster Submission

Table generation scripts support SLURM cluster submission via [submitit](https://github.com/facebookincubator/submitit). Install the cluster extras first:

```bash
pip install -e ".[cluster]"
```

SLURM parameters are configured in YAML files (see `reproducibility/configs/slurm_default.yaml`):

```yaml
slurm:
partition: "YOUR_CPU_PARTITION"
timeout_min: 360
cpus_per_task: 8
mem_gb: 32
```

Submit jobs, then collect results after completion:

```bash
cd reproducibility

# Submit all table jobs to SLURM
python generate_benchmark_tables.py --slurm-config configs/slurm_default.yaml

# After jobs complete, collect results and generate tables
python generate_benchmark_tables.py --collect

# Or use Make targets
make tables-submit # submit all
make tables-submit SLURM_CONFIG=configs/my_cluster.yaml # custom config
make tables-collect # collect all
```

Scripts run locally by default. Use `--slurm-config` to submit to a SLURM cluster instead.

### Troubleshooting

**Memory issues:** Use `--subset` flag for testing, process one dataset at a time, or increase system swap space.

**Missing data:** Verify `data/polygraph_graphs/` exists in repo root, run `python download_data.py` to check data status, or download manually from Proton Drive.

**TabPFN issues:** TabPFN is pinned to v2.0.9 for reproducibility: `pip install tabpfn==2.0.9`.

## Citing

Expand Down
Loading