Refactor SCC traversal from recursiveVisit to iterativeVisit by andmadri · Pull Request #365 · leaningtech/cheerp-compiler

andmadri · 2026-04-23T13:46:32Z

Restructured the algorithm to visit SCCs iteratively using
an explicit stack instead of recursion.

Additionally, blocksStorage now replaces childrenNodes. This
change preserves the BBGNs because they can still be referenced
after being popped from the stack. Instead of each child maintaining
its own local copy of its blocks, the storage is now managed globally.

Also added a isloopCopy flag to translate the BBProgress optimization.

andmadri · 2026-04-24T11:47:29Z

After testing with CheerpX, It seems that I could not remove the cleanUp function as I thought I could. However, I removed some containers related to that function that were redundant, which makes PartialExecuter a bit faster.

yuri91 · 2026-05-01T11:54:28Z

-	// Then iff there are any edges going back to start, we add all nodes again as a single SCC to the end
-	//
-	// During the actual visit we might discover that we eventually will not loop back to start (so the recursion terminate) or we stop since we reached the maximum iteration number
+    // We are separating a group of N basic blocks into Strongly Connected Components (SCCs).


the whole comment block uses spaces for indentations, while the rest of the file uses tabs.

yuri91 · 2026-05-01T11:54:42Z

+	stack.push_back(&blocksStorage.back());

 	uint32_t nextId = 0;
+    // We delay pushing to the stack by one iteration to intentionally drop the final SCC.


the whole comment block uses spaces for indentations, while the rest of the file uses tabs.

yuri91 · 2026-05-01T11:58:50Z

+    // If there is exactly 1 block, there are no intermediate blocks to isolate into SCCs.
+    // Furthermore, self-loops (edges from 'start' back to 'start') are evaluated naturally 
+    // while processing the single block itself. Creating a separate 'loop copy' on the stack 
+    // is redundant, so we skip the split entirely.


the comment says that the loop copy is redundant, but the code below still adds it to the storage and the stack?

Yes, I meant it was redundant when the size of blocks was one, for any other case we do need the loop copy. I changed the comment to be more specific about this

yuri91 · 2026-05-01T12:01:06Z

+			stack.pop_back();

-		llvm::SmallVector<llvm::BasicBlock*, 4> visitNext;
+			currNode->currIter = data.getVisitCounter(currNode->start);


currNode->start is null if !currNode->isReachable right? if so, here we are adding a value with a null key to the dense map. which is not a problem, but if we move this after the isReachable check we can avoid it

yuri91 · 2026-05-01T12:04:25Z

-		for (BasicBlock* bb : subsets.back())
+		blocksStorage.emplace_back(&BBNode, subset);
+		BasicBlockNode* childPtr = &blocksStorage.back();
+		pendingPush = childPtr;


the last iteration never pushes the pendingPush to the stack, but the blockStorage for it is still emplaced at the back. I guess we could avoid doing that if nobody uses it?

Replace the recursive BasicBlockGroupNode::recursiveVisit with an explicit stack-based driver, and split the responsibilities of the old class in two: - BasicBlockNode owns the implicit tree (parentNode pointer, child storage, blockToGroupMap) and the per-node visit state (start, from, isReachable, isMultiHead, BBProgress, SCCProgress, isLoopCopy, visitingAll, ...). It knows how to push its children onto an externally-owned stack via splitIntoSCCs. It is a plain struct: all fields are accessed directly by the driver. - BasicBlockGroupNode is the driver: it owns the work stack, holds the FunctionData reference, and runs iterativeVisit until the stack drains. A processed node is not popped immediately after splitIntoSCCs pushes its children: it sets a clearAllStorage sentinel and stays on the stack. Because the stack is LIFO, the node only becomes the top again after every descendant has been drained -- at which point its child storage can be safely freed without invalidating any parentNode or blockToGroupMap pointer. This keeps live memory proportional to the current root-to-leaf path, mirroring what the recursive version got from the call stack. splitIntoSCCs is also tightened to avoid push/pop churn: in the N=1 case no extra "self-as-child" node is created (self-loops are evaluated naturally), and in the N>1 case the final SCC (the one that contains start) is intentionally never pushed, since the caller runs start inline immediately after the split. The isLoopCopy flag replaces the implicit childrenNodes.size()==1 check that previously identified the loop replica. BBProgress is promoted from a bool& plumbed through the visit helpers to a member of the node. The redundant subsets / subsetIndex parallel data structures are removed; cleanUp now looks the owning child up through blockToGroupMap directly.

Initialize BasicBlockNode iteration state, avoid accidental DenseMap insertion during cleanup, and assert that the delayed SCC split drops only the start component. Also remove a redundant storage lookup and fix stale comments and whitespace around the iterative SCC visitor.

Replace the node-owned cleanup sentinel with an explicit stack frame phase. This keeps the non-recursive scheduling order unchanged while making the parent-storage cleanup step part of the stack protocol instead of mutable node state.

andmadri requested review from alexp-sssup and yuri91 and removed request for alexp-sssup April 23, 2026 13:46

andmadri commented Apr 28, 2026

View reviewed changes

Comment thread llvm/lib/CheerpWriter/PartialExecuter.cpp Outdated

andmadri force-pushed the non_recursive_partial_executer branch 2 times, most recently from 5cefa7e to 40b806c Compare April 30, 2026 15:30

yuri91 requested changes May 1, 2026

View reviewed changes

andmadri force-pushed the non_recursive_partial_executer branch from 40b806c to a43ec95 Compare May 6, 2026 11:25

andmadri requested a review from yuri91 May 6, 2026 11:29

yuri91 added the hydra Run hydra on this PR label May 6, 2026

andmadri force-pushed the non_recursive_partial_executer branch from a43ec95 to 3ca4703 Compare May 6, 2026 12:03

yuri91 added 3 commits May 12, 2026 17:48

yuri91 force-pushed the non_recursive_partial_executer branch from 3ca4703 to af7a48d Compare May 22, 2026 15:56

yuri91 merged commit d5404f2 into master May 26, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor SCC traversal from recursiveVisit to iterativeVisit#365

Refactor SCC traversal from recursiveVisit to iterativeVisit#365
yuri91 merged 3 commits into
masterfrom
non_recursive_partial_executer

andmadri commented Apr 23, 2026

Uh oh!

andmadri commented Apr 24, 2026

Uh oh!

Uh oh!

yuri91 May 1, 2026

Uh oh!

yuri91 May 1, 2026

Uh oh!

yuri91 May 1, 2026

Uh oh!

andmadri May 6, 2026

Uh oh!

yuri91 May 1, 2026

Uh oh!

yuri91 May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andmadri commented Apr 23, 2026

Uh oh!

andmadri commented Apr 24, 2026

Uh oh!

Uh oh!

yuri91 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

yuri91 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

yuri91 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

andmadri May 6, 2026

Choose a reason for hiding this comment

Uh oh!

yuri91 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

yuri91 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants