Skip to content

Camera ready#48

Open
pjhartout wants to merge 20 commits intomasterfrom
camera_ready
Open

Camera ready#48
pjhartout wants to merge 20 commits intomasterfrom
camera_ready

Conversation

@pjhartout
Copy link
Collaborator

Set of improvements for the camera-ready version

- Makefile with targets for running table/figure generation
- README with usage instructions and data download link
- requirements.txt with dependencies
- download_data.py script for fetching pre-generated graphs
- generate_benchmark_tables.py: Standard PGD metrics with VUN
- generate_mmd_tables.py: Gaussian TV and RBF MMD metrics

Both scripts use the polygraph-benchmark API:
- StandardPGDInterval for PGD computation
- GaussianTVMMD2BenchmarkInterval/RBFMMD2BenchmarkInterval for MMD
- Proper graph format conversion for DIGRESS tensor outputs
Computes PGD metrics using Logistic Regression classifier instead of
TabPFN, with standard descriptors (orbit counts, degree, spectral,
clustering, GIN).

Uses PolyGraphDiscrepancyInterval with sklearn LogisticRegression
for classifier-based evaluation.
Compares standard PGD (max over individual descriptors) vs concatenated
PGD (all descriptors combined into single feature vector).

Features:
- ConcatenatedDescriptor class with PCA dimensionality reduction
- Handles TabPFN 500-feature limit via PCA to 100 components
- Uses LogisticRegression for concatenated features
- Optimized subset mode for faster testing
- generate_model_quality_figures.py: Training/denoising curves
- generate_perturbation_figures.py: Metric sensitivity to edge perturbations
- generate_phase_plot.py: PGD vs VUN training dynamics
- generate_subsampling_figures.py: Bias-variance tradeoff analysis

All scripts use StandardPGDInterval from polygraph-benchmark API.
Phase plot gracefully handles missing VUN values (requires graph_tool).
Rename [project] to [workspace] per updated pixi schema and correct
the pypi-dependencies package name from polygraph to polygraph-benchmark.
Move the expected data path from polygraph_graphs/ to data/polygraph_graphs/
to keep generated data under the gitignored data/ directory.
Documentation is consolidated into the main README and dependencies
are managed through pyproject.toml extras.
Add submitit-based cluster module for distributing reproducibility
workloads across SLURM nodes. Includes YAML-configurable job parameters,
job metadata tracking, and result collection helpers.

- cluster.py: shared wrapper with SlurmConfig, submit_jobs, collect_results
- configs/: default CPU and GPU SLURM configurations
- pyproject.toml: new [cluster] optional dependency group (submitit, pyyaml)
Add --slurm-config, --local, and --collect CLI options to all four table
generation scripts for distributing computation across SLURM nodes.
Each script gains a standalone task function suitable for submitit,
result reshaping helpers, and three execution modes (local, submit, collect).

Also updates DATA_DIR paths and adds tables-submit/tables-collect Make targets.
Document the full reproducibility workflow including data download,
script overview, Make targets, hardware requirements, SLURM cluster
submission, and troubleshooting tips.
Include LaTeX tables and PDF figures produced by the reproducibility
scripts so reviewers can verify outputs without re-running computation.
Remove unused imports and extraneous f-string prefixes.
The reproducibility scripts generate these outputs. Ignore the
output directories so only the code is tracked.
Fix 2 P1 bugs: restore discarded reference_graphs in phase plot,
make reshape_results non-mutating. Extract shared module common.py
to eliminate ~700 lines of duplication across 8 scripts. Hoist
metric instantiation outside loops, fix O(n²) edge rewiring,
replace print() with loguru, move imports to file tops, remove
dead code, add tests for shared utilities.
Add docs/plans/ to .gitignore and untrack existing plan files.
Set cluster-specific partitions (p.hpcl91 for GPU, p.hpcl94c for CPU)
and add a local job launcher that runs without SLURM/submitit via
--local flag or `make tables-local`.
- Standardize PGS/polyscore naming to PGD throughout all scripts
- Fix LocalJob to capture exceptions and defer to .result()
- Add unknown key validation to SlurmConfig.from_yaml
- Remove duplicated direct execution paths from table scripts
- Remove dead code (setup_plotting, unused TypeVar, make_executor local param)
- Rename shadowing load_graphs to load_pickle in model_quality script
- Fix Makefile shell injection with quoted SLURM_CONFIG
- Replace sys.path hack with pytest pythonpath config
- Add tests for LocalJob and SlurmConfig
- Replace placeholder partition names in SLURM configs
- Fix README TabPFN version (2.0.0 → 2.0.9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant