Make data.download not use Docker with DirectRunner and apache_beam>=2.68.0
#182
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR restores the default behaviour of
basic_pitch.data.downloadwhich changed after an apache_beam package update (and additionally fixes a few obsolete comments, variables, and a input param typo)Description
A fairly recent apache_beam update (~Sept 2025) changed the default behaviour of the
basic_pitch.data.downloadmodule.With apache_beam < 2.68, DirectRunner would run the pipeline in a local process. With >= 2.68 the DirectRunner pipeline would use Docker as specified by the
environment_type=DOCKERparameter. Asenvironment_type=DOCKERis default froPortableRunnerandDataflowRunner, we don't really need to specify it. Removing theenvironment_type=DOCKERmakes DirectRunner run in a local process.