Skip to content

refactor(bootstrapper): convert recursive bootstrap to iterative proessing#1116

Open
rd4398 wants to merge 1 commit intopython-wheel-build:mainfrom
rd4398:iterative-bootstrap-with-stages
Open

refactor(bootstrapper): convert recursive bootstrap to iterative proessing#1116
rd4398 wants to merge 1 commit intopython-wheel-build:mainfrom
rd4398:iterative-bootstrap-with-stages

Conversation

@rd4398
Copy link
Copy Markdown
Contributor

@rd4398 rd4398 commented May 5, 2026

Replace the recursive DFS in Bootstrapper.bootstrap() with an explicit LIFO stack and a six-phase state machine (START, PREPARE_SOURCE, PREPARE_BUILD, BUILD, PROCESS_INSTALL_DEPS, COMPLETE). This eliminates Python stack overflow on deep/wide dependency graphs, especially with --multiple-versions enabled.

Key changes:

  • Add BootstrapPhase enum and WorkItem dataclass for phase-based processing
  • Add six phase handler methods that replace the monolithic _bootstrap_impl
  • Rewrite bootstrap() to use an iterative DFS loop with explicit work stack
  • Remove _bootstrap_single_version, _bootstrap_impl, _build_from_source
  • Preserve all three error handling modes (normal, test, multiple-versions)
  • Keep _prepare_build_dependencies for git URL resolution path only
  • Update tests to match new internal structure

@rd4398 rd4398 requested a review from a team as a code owner May 5, 2026 19:06
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@rd4398 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 46 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 467f3a57-0767-4087-9695-18e1dd953dcc

📥 Commits

Reviewing files that changed from the base of the PR and between 854abb9 and 2e27113.

📒 Files selected for processing (2)
  • src/fromager/bootstrapper.py
  • tests/test_bootstrapper.py
📝 Walkthrough

Walkthrough

The bootstrap() method was rewritten to use an explicit iterative LIFO loop over WorkItem objects carrying phase state, replacing the prior recursive single-version flow. Two new types (BootstrapPhase enum and WorkItem dataclass) model the end-to-end dependency preparation lifecycle. Six phase handlers (_phase_start, _phase_prepare_source, _phase_prepare_build, _phase_build, _phase_process_install_deps, _phase_complete) replace prior monolithic logic, with centralized error handling via _handle_phase_error(). The _handle_build_requirements() method was adjusted to preserve self.why stack integrity during nested bootstraps. Tests were simplified to validate the new phase-build handler and reworked to test error handling at the dispatch level.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: converting recursive bootstrap to iterative processing using a state machine approach.
Description check ✅ Passed The description is well-detailed and directly explains the refactor's purpose, key changes, and benefits related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rd4398
Copy link
Copy Markdown
Contributor Author

rd4398 commented May 5, 2026

We have to choose between #1109 and this PR to proceed forward. Please take a look and let me know what you think.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/bootstrapper.py`:
- Around line 1489-1494: The code currently unions item.build_backend_deps and
item.build_sdist_deps which relabels BUILD_SDIST edges as BUILD_BACKEND and
loses the duplicate-edge semantics; change the logic in the place that builds
dep_items (call _create_dep_work_items) to handle the two dependency sets
separately instead of doing item.build_backend_deps | item.build_sdist_deps:
call _create_dep_work_items(item.build_backend_deps,
RequirementType.BUILD_BACKEND, item.req, item.resolved_version) and separately
_create_dep_work_items(item.build_sdist_deps, RequirementType.BUILD_SDIST,
item.req, item.resolved_version), then combine/extend their results so both
typed edges are preserved (so self.why, _add_to_graph and deduping behavior
remain correct).
- Around line 1667-1677: When removing a failed version in the multiple_versions
branch, also remove the "seen" marker that _phase_start() previously set so
future traversals can retry that version; specifically after calling
self.ctx.dependency_graph.remove_dependency(...) and before
self.ctx.write_to_graph_to_file(), drop the entry from whatever in-memory
seen-tracking structure used by _phase_start() (e.g.
self._seen.discard((pkg_name, str(item.resolved_version))) or call the
corresponding clear/unmark method), and ensure _failed_versions append and
logging remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cacc4cdb-4289-4fbf-8845-dab740b0d233

📥 Commits

Reviewing files that changed from the base of the PR and between 1429c1a and 854abb9.

📒 Files selected for processing (2)
  • src/fromager/bootstrapper.py
  • tests/test_bootstrapper.py

Comment thread src/fromager/bootstrapper.py Outdated
Comment thread src/fromager/bootstrapper.py
@rd4398 rd4398 force-pushed the iterative-bootstrap-with-stages branch from 88e0673 to b8e3f29 Compare May 5, 2026 19:20
Comment thread src/fromager/bootstrapper.py
Comment thread src/fromager/bootstrapper.py Outdated
# (the loop modifies self.why for each work item)
saved_why = list(self.why)

# Create initial work items (reversed so first version is on top of stack)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "first" mean in this context? Lowest version number?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"First" here means highest version — resolve_versions returns results sorted highest-first (line 248)
I will add a comment to explicitly state that

Comment thread src/fromager/bootstrapper.py
Comment thread src/fromager/bootstrapper.py
Comment thread src/fromager/bootstrapper.py Outdated
if not new_items:
self.progressbar.update()

# Push items so first in list ends up on top (processed first)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to reverse the list here because we're not treating the list in the phase functions as a stack. I would have expected the phase to return something like [item] + new_items and that resulting list would then need to have new_items processed before trying item again. That way extend() here just adds to the stack in the same natural order.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I agree. I have made changes in latest commit


Called outside _track_why context to match original behavior
where graph addition and seen-check happen before pushing
the current item onto the why stack.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help to explicitly explain what the return list is for each phase function. It's here in this paragraph, but calling it out as something like this would help make the flow more obvious:

Returns the current item to push it back onto the stack for the prepare-source phase.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have made changes in latest commit.

Comment thread src/fromager/bootstrapper.py Outdated
)

item.phase = BootstrapPhase.PREPARE_BUILD
return dep_items + [item]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treat the return value as a stack. We want to process dep_items before looking at item again.

Suggested change
return dep_items + [item]
return [item] + dep_items

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

return []

def _dispatch_phase(self, item: WorkItem) -> list[WorkItem]:
"""Route a work item to the appropriate phase handler."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we log the phase anywhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were not logging it. I have made changes in latest commit


# Create initial work items (reversed so first version is on top of stack)
stack: list[WorkItem] = []
for source_url, resolved_version in reversed(resolved_versions):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making resolving the versions a phase, too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this when claude was making changes. The main issue is that resolution is one-to-many in multiple_versions mode: one unresolved requirement can produce N resolved versions, each needing its own work item. Every other phase is one-to-one (item advances) or one-to-one-plus-deps (item advances and spawns children). A RESOLVE phase would be the only one that replaces itself with multiple peer items — a different pattern that complicates the state machine.

It would also make source_url and resolved_version Optional on WorkItem, adding None-guard noise in every subsequent phase. Right now those fields are always populated at creation time, which is a nice invariant.

The current split feels natural: bootstrap() resolves the top-level entry point, and _create_dep_work_items resolves each dependency inline when the parent discovers it. Resolution stays close to where deps are discovered rather than being deferred to a later loop iteration.

I am inclined to leave it as it is unless you prefer making it a separate state

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would just be moving the creation of the work items that's happening here inside a phase.

What I'm thinking is that a work item would start in the RESOLVE phase and that phase returns a stack containing the original work item and new work items for all of the resolved versions. The original work item would have its next phase set to COMPLETE, because we don't need to put it through all of the other stages but we want it to stay in the stack until its versions have been processed. The individually resolved items would have their next phase set to what is currently called START because those don't need to go through the resolution process. So, all of the existing steps would still receive resolved_version and source_url values.

The benefit of having that logic in one place means that all new dependencies we encounter get "expanded" the same way and that means when we add the logic to filter out versions that already exist in the package index or on disk or whatever, we can do it there one time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit of doing it on the stack is when we eventually start trying to parallelize some of these things we can have code look at all of the RESOLVE tasks on the stack and run them in threads, feeding the results back to the right spots within the stack.

items: list[WorkItem] = []
for dep in self._sort_requirements(deps):
try:
resolved = self.resolve_versions(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the version resolution a phase would mean we would only have this logic in 1 place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. answered in above comment

…essing

Replace the recursive DFS in Bootstrapper.bootstrap() with an explicit
LIFO stack and a six-phase state machine (START, PREPARE_SOURCE,
PREPARE_BUILD, BUILD, PROCESS_INSTALL_DEPS, COMPLETE). This eliminates
Python stack overflow on deep/wide dependency graphs, especially with
--multiple-versions enabled.

Key changes:
- Add BootstrapPhase enum and WorkItem dataclass for phase-based processing
- Add six phase handler methods that replace the monolithic _bootstrap_impl
- Rewrite bootstrap() to use an iterative DFS loop with explicit work stack
- Remove _bootstrap_single_version, _bootstrap_impl, _build_from_source
- Preserve all three error handling modes (normal, test, multiple-versions)
- Keep _prepare_build_dependencies for git URL resolution path only
- Update tests to match new internal structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Rohan Devasthale <rdevasth@redhat.com>
@rd4398 rd4398 force-pushed the iterative-bootstrap-with-stages branch from eb16669 to 2e27113 Compare May 6, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants