Skip to content

Automatic dynamic resolution of pip environments #3671

@nikonikolov

Description

@nikonikolov

Description

Consider a monorepo with many separate pip environments

  • Let py_library A depend on numpy and have an associated pip environment (i.e. requirements.txt.lock file)
  • Let py_library B depend on A and a specific version numpy==X and also have its own pip environment.

For B, whether we end up using the version of numpy from A or from B's pip environment depends entirely on the order of deps in the definition of B.

py_library(
    name = "A",
    deps = [pip_A("numpy")],
)

py_library(
    name = "B",
    deps = [
        # If we accidentally swap the order, we are going to use `pip_A("numpy")`
        pip_B("numpy"),
        "//path/to:A",
    ],
)

When we have a big monorepo with multiple bazel subpackages and multiple pip environments this very quickly gets out of hand and can a lot of problems.

Describe the solution you'd like

It would be extremely nice if the environment can be automatically resolved. For example, instead of the above, we instead write

py_library(
    name = "A",
    deps = [pip_dep("numpy")],
)

py_library(
    name = "B",
    deps = [
        pip_dep("numpy"),
        "//path/to:A",
    ],
)

Here pip_dep automatically resolves to the correct environment at build time. For A it resolves to pip_A, for B it resolves to pip_B (this can also be propagated from py_test and py_binary). For this to function correctly the assumption is that pip_B contains all pip_A packages that A requires, but not necessarily the same versions. This can very easily be achieved with proper dependency management across pip environments - either via -r A_requirements.in or by recursive inclusion in pyproject.toml.

The above behavior is theoretically possible today by adding a special macro pip_dep which resolves based on a string_flag and config_setting. However, this is quite far from convenient from an usage perspective, for example:

  • The user needs to select and pass the correct flag for any binary or test that he runs
  • Simultaneously running the entire suite of tests for the monorepo is not possible since bazel needs to be invoked with a different flag for every subpackage (this also can make things a lot slower)

Ideally the correct environment can be automatically resolved for a given target based on its path in the repo (if we assume all targets under B use pip_B and all targets under A use pip_A, which is IMO a sensible assumption). Then one can enforce the correct environment via transition. On the command line targets can be invoked without any extra configuration arguments (or configuration arguments can overwrite the automatic resolution).

Example implementation https://gist.github.com/nikonikolov/3204c93621c86b6f6d10723f65e7c1b7. Likely very suboptimal and can be improved. I am in no way a bazel expert, I coded it up with the help of Gemini

Describe alternatives you've considered

  1. Using a single lock file for the entire repo -> most of the time impossible due to conflicting requirements in different subpackages
  2. Make sure that if B depends on A, it always uses the same versions of the pip packages from A. This means user need to use -c A_requirements.txt.lock to compile B_requirements.txt.lock. With multiple subpackages and conflicting environments, one would be constantly running into version conflicts which will have to be manually resolved (if at all possible), which is obviously far from desirable (hence the need for tools such as uv pip compile or pip-compile)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions