Skip to content

Refactor SCC traversal from recursiveVisit to iterativeVisit#365

Merged
yuri91 merged 3 commits into
masterfrom
non_recursive_partial_executer
May 26, 2026
Merged

Refactor SCC traversal from recursiveVisit to iterativeVisit#365
yuri91 merged 3 commits into
masterfrom
non_recursive_partial_executer

Conversation

@andmadri
Copy link
Copy Markdown

Restructured the algorithm to visit SCCs iteratively using
an explicit stack instead of recursion.

Additionally, blocksStorage now replaces childrenNodes. This
change preserves the BBGNs because they can still be referenced
after being popped from the stack. Instead of each child maintaining
its own local copy of its blocks, the storage is now managed globally.

Also added a isloopCopy flag to translate the BBProgress optimization.

@andmadri andmadri requested review from alexp-sssup and yuri91 and removed request for alexp-sssup April 23, 2026 13:46
@andmadri
Copy link
Copy Markdown
Author

After testing with CheerpX, It seems that I could not remove the cleanUp function as I thought I could. However, I removed some containers related to that function that were redundant, which makes PartialExecuter a bit faster.

Comment thread llvm/lib/CheerpWriter/PartialExecuter.cpp Outdated
@andmadri andmadri force-pushed the non_recursive_partial_executer branch 2 times, most recently from 5cefa7e to 40b806c Compare April 30, 2026 15:30
// Then iff there are any edges going back to start, we add all nodes again as a single SCC to the end
//
// During the actual visit we might discover that we eventually will not loop back to start (so the recursion terminate) or we stop since we reached the maximum iteration number
// We are separating a group of N basic blocks into Strongly Connected Components (SCCs).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole comment block uses spaces for indentations, while the rest of the file uses tabs.

stack.push_back(&blocksStorage.back());

uint32_t nextId = 0;
// We delay pushing to the stack by one iteration to intentionally drop the final SCC.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole comment block uses spaces for indentations, while the rest of the file uses tabs.

// If there is exactly 1 block, there are no intermediate blocks to isolate into SCCs.
// Furthermore, self-loops (edges from 'start' back to 'start') are evaluated naturally
// while processing the single block itself. Creating a separate 'loop copy' on the stack
// is redundant, so we skip the split entirely.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment says that the loop copy is redundant, but the code below still adds it to the storage and the stack?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant it was redundant when the size of blocks was one, for any other case we do need the loop copy. I changed the comment to be more specific about this

stack.pop_back();

llvm::SmallVector<llvm::BasicBlock*, 4> visitNext;
currNode->currIter = data.getVisitCounter(currNode->start);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currNode->start is null if !currNode->isReachable right? if so, here we are adding a value with a null key to the dense map. which is not a problem, but if we move this after the isReachable check we can avoid it

for (BasicBlock* bb : subsets.back())
blocksStorage.emplace_back(&BBNode, subset);
BasicBlockNode* childPtr = &blocksStorage.back();
pendingPush = childPtr;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last iteration never pushes the pendingPush to the stack, but the blockStorage for it is still emplaced at the back. I guess we could avoid doing that if nobody uses it?

@andmadri andmadri force-pushed the non_recursive_partial_executer branch from 40b806c to a43ec95 Compare May 6, 2026 11:25
@andmadri andmadri requested a review from yuri91 May 6, 2026 11:29
@yuri91 yuri91 added the hydra Run hydra on this PR label May 6, 2026
@andmadri andmadri force-pushed the non_recursive_partial_executer branch from a43ec95 to 3ca4703 Compare May 6, 2026 12:03
yuri91 added 3 commits May 12, 2026 17:48
Replace the recursive BasicBlockGroupNode::recursiveVisit with an
explicit stack-based driver, and split the responsibilities of the
old class in two:

  - BasicBlockNode owns the implicit tree (parentNode pointer, child
    storage, blockToGroupMap) and the per-node visit state (start,
    from, isReachable, isMultiHead, BBProgress, SCCProgress,
    isLoopCopy, visitingAll, ...). It knows how to push its children
    onto an externally-owned stack via splitIntoSCCs. It is a plain
    struct: all fields are accessed directly by the driver.
  - BasicBlockGroupNode is the driver: it owns the work stack, holds
    the FunctionData reference, and runs iterativeVisit until the
    stack drains.

A processed node is not popped immediately after splitIntoSCCs
pushes its children: it sets a clearAllStorage sentinel and stays
on the stack. Because the stack is LIFO, the node only becomes the
top again after every descendant has been drained -- at which point
its child storage can be safely freed without invalidating any
parentNode or blockToGroupMap pointer. This keeps live memory
proportional to the current root-to-leaf path, mirroring what the
recursive version got from the call stack.

splitIntoSCCs is also tightened to avoid push/pop churn: in the N=1
case no extra "self-as-child" node is created (self-loops are
evaluated naturally), and in the N>1 case the final SCC (the one
that contains start) is intentionally never pushed, since the
caller runs start inline immediately after the split.

The isLoopCopy flag replaces the implicit childrenNodes.size()==1
check that previously identified the loop replica. BBProgress is
promoted from a bool& plumbed through the visit helpers to a member
of the node. The redundant subsets / subsetIndex parallel data
structures are removed; cleanUp now looks the owning child up
through blockToGroupMap directly.
Initialize BasicBlockNode iteration state, avoid accidental DenseMap insertion during cleanup, and assert that the delayed SCC split drops only the start component.

Also remove a redundant storage lookup and fix stale comments and whitespace around the iterative SCC visitor.
Replace the node-owned cleanup sentinel with an explicit stack frame phase. This keeps the non-recursive scheduling order unchanged while making the parent-storage cleanup step part of the stack protocol instead of mutable node state.
@yuri91 yuri91 force-pushed the non_recursive_partial_executer branch from 3ca4703 to af7a48d Compare May 22, 2026 15:56
@yuri91 yuri91 merged commit d5404f2 into master May 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hydra Run hydra on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants