Skip to content

Relax dependency constraints#1713

Open
filippsatverily wants to merge 8 commits intocdisc-org:mainfrom
filippsatverily:filipps/update_deps_main
Open

Relax dependency constraints#1713
filippsatverily wants to merge 8 commits intocdisc-org:mainfrom
filippsatverily:filipps/update_deps_main

Conversation

@filippsatverily
Copy link
Copy Markdown

@filippsatverily filippsatverily commented Apr 29, 2026

Summary: expand dependency constraints and turn requirements.txt from a requirements file into a lockfile. This will allow the PyPi version of CORE to be used as a library in a larger project where other dependency constraints exist.

List of changes:

  • Define dependency constraints in pyproject.toml
    • Remove requirements-dev.txt
    • Repurpose requirements.txt as a lockfile
  • Change constraints from exact pinned versions to ranges
  • Dependency range upgrades:
    • Remove CLI defaults that clash with required to support a behavior change/fix in click 8.3.0
    • Edit DatasetXPTMetadataReader.read to support a behavior change in pyreadstat 1.2.9
    • Edit USDMDataService.__get_full_path to support a behavior change in jsonpath-ng 1.8.0
    • Remove unnecessary self capture in ContentsDefineVLMDatasetBuilder to support dask 2024.8.1
    • Remove __setitem__ reindexing in DaskDataset to fix errors surfaced in dask 2025.4.0
    • Fix test_dataset_metadata_define_dataset_builder to support a sort behavior fix in pandas 2.2.0

Tested scenarios:

Moves dependency constraints to pyproject.toml.
Makes requirements.txt a lockfile.
Fixes an incompatibility caused by click 8.3.0, which passes the default value as-is.
Fixes an incompatibility caused by pyreadstat 1.2.9, which changed original_variable_type from 'NULL' to None
Works around an behavior change in jsonpath-ng 1.8.0 where Child.str gets wrapped in parenthesis.
Fixes tokenization errors when using dask 2024.8.1+. Starting with this version, dask enforces that tokens remain stable across pickle round-trips (dask/dask#11320). Capturing self in a lambda fails this check because instance objects can have non-deterministic pickle representations. Since calculate_variable_value_length is already a static method, replacing self with the class name is enough to remove the capture.
Dask 2025.4.0 optimizes multiple DataFrames together, which exposes division mismatches and causes dask to throw an error. This change removes a source of repartitioning, preserving the divisions when assigning a pandas series to a dask dataframe
Fixes a unit test to support pandas 2.2.0+. The pandas release fixes an sorting bug with pandas-dev/pandas#54611. This commit changes the expected results accordingly.
@filippsatverily filippsatverily force-pushed the filipps/update_deps_main branch from 14ffff8 to a646ccf Compare April 29, 2026 01:38
@filippsatverily filippsatverily marked this pull request as draft April 29, 2026 01:41
@filippsatverily filippsatverily marked this pull request as ready for review April 30, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant