Skip to content

feat!: timeouts#457

Open
tristan-f-r wants to merge 34 commits into
Reed-CompBio:mainfrom
tristan-f-r:timeout-arg
Open

feat!: timeouts#457
tristan-f-r wants to merge 34 commits into
Reed-CompBio:mainfrom
tristan-f-r:timeout-arg

Conversation

@tristan-f-r
Copy link
Copy Markdown
Collaborator

@tristan-f-r tristan-f-r commented Jan 13, 2026

Adds timeout to algorithms as a demonstration of passing through errors, and introduces run settings, breaking the original configuration design by introducing a new params option. Closes #316.

Caveat: ML requires at least one pathway, and failing pathways can break ML-work. How do we want to handle downstream analysis when errors occur (including in the future heuristic errors?)

@tristan-f-r tristan-f-r added the enhancement New feature or request label Jan 13, 2026
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Jan 13, 2026

Documentation build overview

📚 spras | 🛠️ Build #32590107 | 📁 Comparing 4a95d79 against latest (87b314c)

  🔍 Preview build  

9 files changed · + 2 added · ± 7 modified

+ Added

± Modified

@tristan-f-r tristan-f-r added the P-high This is a blocker for many PRs/issues/features label Jan 13, 2026
@github-actions github-actions Bot added the merge-conflict This PR has merge conflicts. label Jan 31, 2026
@github-actions github-actions Bot removed the merge-conflict This PR has merge conflicts. label Jan 31, 2026
@ntalluri
Copy link
Copy Markdown
Collaborator

ntalluri commented Feb 5, 2026

This does not work with singularity (singularity has no docker wait equivalent and to implement timeouts in singularity would probably require constant polling of a detached thread)

This is a problem; if this is something we are going to use for the benchmarking study we need this to work with singularity because CHTC only uses singularity/apptainer.

@tristan-f-r
Copy link
Copy Markdown
Collaborator Author

tristan-f-r commented Feb 5, 2026

I originally assumed this was a more esoteric PR to test the error-handling workflow. Though, based on the meeting just now, I'll look into a nice way to get this working with Singularity.

@github-actions github-actions Bot added the merge-conflict This PR has merge conflicts. label Mar 18, 2026
Copy link
Copy Markdown
Collaborator

@ntalluri ntalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my first round review. I like the empty files if an error occurs, and it is good to have an associated log explaining why.
(Adding this comment again here) My only issue with this is the definition of "error." If a parameter combination fails the heuristics check, I do not want the output to be empty. I want the output to reflect what was actually produced, so I do not have to rerun that combination even though it "failed" the heuristics. I should be able to freely update the heuristics and have that output counted if it now passes, without rerunning combinations that previously produced empty output. In short, a parameter combination failing the heuristics should not be classified as an error as defined in this PR.

This PR not working with singularity is a very big problem because of the chtc integration.

"ML requires at least one pathway, and failing pathways can break ML-work. How do we want to handle downstream analysis when errors occur." This seems like a separate problem that I will fix internally in the ML code (I want to still make figures if we only have one pathway or a set of empty pathways).

Comment thread config/config.yaml
Comment thread spras/analysis/cytoscape.py
Comment thread spras/config/config.py Outdated
Comment thread spras/containers.py
Comment thread spras/prm.py
Comment thread Snakefile Outdated
Comment thread Snakefile
Comment thread Snakefile
Comment thread Snakefile
Comment thread Snakefile
tristan-f-r and others added 4 commits April 23, 2026 19:31
This perplexes me but from my tests we do not need --keep-going. I do not know my original intent here
Co-Authored-By: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
@tristan-f-r
Copy link
Copy Markdown
Collaborator Author

tristan-f-r commented Apr 23, 2026

My only issue with this is the definition of "error." If a parameter combination fails the heuristics check, I do not want the output to be empty. I want the output to reflect what was actually produced, so I do not have to rerun that combination even though it "failed" the heuristics.

The small part here that is adaptable is that errors, in this PR, are only made in the reconstruct rule. We can define other errors in some heuristics rule, and we can rewire other rules to depend on the success of heuristics instead of on reconstruct (via that resource_info = rules.reconstruct.output.resource_info input rule above, but instead we would say rules.heuristics.output.resource_info instead.)

@github-actions github-actions Bot removed the merge-conflict This PR has merge conflicts. label Apr 23, 2026
@tristan-f-r
Copy link
Copy Markdown
Collaborator Author

This works with apptainer now, but this now touches an untested part of profiling, so that part needs a review from @jhiemstrawisc.

Copy link
Copy Markdown
Collaborator

@ntalluri ntalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is half of a review on this pr

Comment thread docs/design/errors.rst Outdated
Comment thread docs/timeout.rst Outdated
Comment thread docs/timeout.rst Outdated
Comment thread spras/containers.py Outdated
Comment thread spras/containers.py
Co-Authored-By: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
@tristan-f-r
Copy link
Copy Markdown
Collaborator Author

tristan-f-r commented May 4, 2026

I've made a few changes:

  • With your line of thought of considering errors as important objects in documentation, I've pushed a commit that moves out some of the error object handling out from Snakefile into an errors.py, and we further use pydantic to make sure we avoid arbitrary JSON handling (also to give SPRAS automation users a way to easily parse our error objects). This was also motivated by encountering a type error with mark_error, which is why I've opted to make this type-safe.
  • I've also renamed resource -> artifact, as I found the former a little more confusing, though the latter isn't an optimal name either: see the top-level comment on errors.py.
  • timeout is now per-run, with optional specification for an entire algorithm. This refactor was intentionally done for conditionals, though it can also find its use with many other configuration settings. This also introduces a new key, params under all runs. We mark the PR as breaking because of this.

The final bullet point here also means that the conditionals PR will now actually depend on this PR.

@tristan-f-r tristan-f-r changed the title feat: timeouts feat!: timeouts May 4, 2026
@tristan-f-r tristan-f-r added the tuning Workflow-spanning algorithm tuning label May 4, 2026
@ntalluri
Copy link
Copy Markdown
Collaborator

ntalluri commented May 6, 2026

does this need to use #464?

@tristan-f-r
Copy link
Copy Markdown
Collaborator Author

No - I was thinking that the RunSettings here can be more generally applicable to #464.

why did i not do this before???
this is why I didn't do this earlier...
@tristan-f-r
Copy link
Copy Markdown
Collaborator Author

I seem to have messed with imports when I went to pass in the entirety of run_settings instead of just timeout. I'll fix this 👍.

Comment thread config/config.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request P-high This is a blocker for many PRs/issues/features tuning Workflow-spanning algorithm tuning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generic spras limit option on containers

2 participants