Skip to content

Conversation

@jennan
Copy link
Collaborator

@jennan jennan commented Sep 13, 2025

This PR implements the following changes:

  • allow pipelines to be named, to make it easier to sub-select a part of a pipeline after its creation,
  • use the | operator to merge pipelines,
  • implement a reverse pipeline property to undo the effect of a pipeline.

Few things need to be done to finish this PR:

  • add tests
  • add a bit of documentation
  • add an example notebook (almost done)
  • detect if 2 pipelines with the same name are merged and issue an error

@jennan jennan self-assigned this Sep 13, 2025
@coveralls
Copy link

coveralls commented Sep 13, 2025

Pull Request Test Coverage Report for Build 18666409006

Details

  • 21 of 42 (50.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.03%) to 61.309%

Changes Missing Coverage Covered Lines Changed/Added Lines %
packages/pipeline/src/pyearthtools/pipeline/controller.py 20 41 48.78%
Totals Coverage Status
Change from base Build 18584796525: -0.03%
Covered Lines: 9554
Relevant Lines: 15165

💛 - Coveralls

@tennlee

This comment was marked as resolved.

@tennlee

This comment was marked as resolved.

@jennan

This comment was marked as resolved.

@jennan

This comment was marked as resolved.

@tennlee
Copy link
Collaborator

tennlee commented Sep 16, 2025

I'll need to do some more diagnosis I guess. I was able to reproduce your pipeline notebook just fine. With the .reversed, I think it might be better to make it .reversed() if it's returning a new reversed pipeline rather than being the same pipeline with a .reversed attribute. It depends on how you expect people to use it. I'm happy with it staying as an attribute if the ideas is that people will use it that way. Like if someone modifies the state of the pipeline somehow, what happens to the reversed one? e.g. if they insert a new step in the middle, what should the reversed pipeline do? I don't think there's any "best" answer, it just needs to be documented. If .reversed is an attribute which will walk the steps applying undo, that's fine. But if you expect to get a copy of the pipeline with separate state, then it should be a method. I don't have an opinion about which one to choose.

This reverts commit d8ca63f.

Save nested pipelines in a dictionary.
return graph, prior_step


class NotReversiblePipeline(RuntimeError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this "ForwardOnly"? Just a thought

…r that"

This reverts commit d075868.

Trying to reverse a pipeline by reversing each underlying operation is
more complicated than expected, due to operations that loose their state
when transposed using `.T`.  One example is the ToNumpy operation, which
has multiple private members that won't be initialised after transposition,
leading to a non-functional reversed operation.  As multiple operations might
have the save behavior, it seems easier to reverse the whole pipeline as a
black box instead of each step.
Treat the ReversePipeline class as an opaque class, not meant to expose
steps of of the underlying pipeline, etc. The Operation class provides the
right API to capture the minimum operations this class needs (apply and undo)
and makes it easy to integrate the ReversedPipeline in an existing pipeline.
@jennan jennan changed the title [Draft] Named pipelines, | operator and reverse pipeline Named pipelines, | operator and reverse pipeline Oct 20, 2025
@jennan
Copy link
Collaborator Author

jennan commented Oct 20, 2025

@tennlee I think the PR is almost ready. Few things that we could change before merging:

  • Do you want to keep the NamedPipe.ipynb example notebook in addition/instead of the notebook Extras.ipynb?
  • pipeline.reversed is a bit less safe now, as I am not checking that every step is an Operation with an working .T attribute. Would it make sense to add some additional exceptions if/when people use a non-reversible pipeline?

@jennan
Copy link
Collaborator Author

jennan commented Oct 20, 2025

So, it turned out that the Pipeline.undo method does a bit more than I thought, removing the need (and possibility) for a thorough inspection of the pipeline object in the ReversedPipeline class constructor. Indeed, Pipeline.undo will automatically skip the steps that are transforms for example. Additionally, steps themselves might have an undo method that does nothing, this is the case of the TemporalRetrieval step for example.

All in all, it means that a preprocessing pipeline (including data sources) can be reversed without the need to carefully split and reassemble the reversible steps. I have updated the example notebook to show this at the end with a simple persistence-like model.

I think this also means that users would need a good understanding of the undo behavior of each steps to avoid bad surprises.

@jennan
Copy link
Collaborator Author

jennan commented Oct 20, 2025

The API and notebook docs have been updated, @tennlee this PR is now ready :).

@tennlee tennlee merged commit 65da37b into ACCESS-Community-Hub:develop Oct 20, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants