Refactor SCC traversal from recursiveVisit to iterativeVisit#365
Conversation
|
After testing with CheerpX, It seems that I could not remove the cleanUp function as I thought I could. However, I removed some containers related to that function that were redundant, which makes PartialExecuter a bit faster. |
5cefa7e to
40b806c
Compare
| // Then iff there are any edges going back to start, we add all nodes again as a single SCC to the end | ||
| // | ||
| // During the actual visit we might discover that we eventually will not loop back to start (so the recursion terminate) or we stop since we reached the maximum iteration number | ||
| // We are separating a group of N basic blocks into Strongly Connected Components (SCCs). |
There was a problem hiding this comment.
the whole comment block uses spaces for indentations, while the rest of the file uses tabs.
| stack.push_back(&blocksStorage.back()); | ||
|
|
||
| uint32_t nextId = 0; | ||
| // We delay pushing to the stack by one iteration to intentionally drop the final SCC. |
There was a problem hiding this comment.
the whole comment block uses spaces for indentations, while the rest of the file uses tabs.
| // If there is exactly 1 block, there are no intermediate blocks to isolate into SCCs. | ||
| // Furthermore, self-loops (edges from 'start' back to 'start') are evaluated naturally | ||
| // while processing the single block itself. Creating a separate 'loop copy' on the stack | ||
| // is redundant, so we skip the split entirely. |
There was a problem hiding this comment.
the comment says that the loop copy is redundant, but the code below still adds it to the storage and the stack?
There was a problem hiding this comment.
Yes, I meant it was redundant when the size of blocks was one, for any other case we do need the loop copy. I changed the comment to be more specific about this
| stack.pop_back(); | ||
|
|
||
| llvm::SmallVector<llvm::BasicBlock*, 4> visitNext; | ||
| currNode->currIter = data.getVisitCounter(currNode->start); |
There was a problem hiding this comment.
currNode->start is null if !currNode->isReachable right? if so, here we are adding a value with a null key to the dense map. which is not a problem, but if we move this after the isReachable check we can avoid it
| for (BasicBlock* bb : subsets.back()) | ||
| blocksStorage.emplace_back(&BBNode, subset); | ||
| BasicBlockNode* childPtr = &blocksStorage.back(); | ||
| pendingPush = childPtr; |
There was a problem hiding this comment.
the last iteration never pushes the pendingPush to the stack, but the blockStorage for it is still emplaced at the back. I guess we could avoid doing that if nobody uses it?
40b806c to
a43ec95
Compare
a43ec95 to
3ca4703
Compare
Replace the recursive BasicBlockGroupNode::recursiveVisit with an
explicit stack-based driver, and split the responsibilities of the
old class in two:
- BasicBlockNode owns the implicit tree (parentNode pointer, child
storage, blockToGroupMap) and the per-node visit state (start,
from, isReachable, isMultiHead, BBProgress, SCCProgress,
isLoopCopy, visitingAll, ...). It knows how to push its children
onto an externally-owned stack via splitIntoSCCs. It is a plain
struct: all fields are accessed directly by the driver.
- BasicBlockGroupNode is the driver: it owns the work stack, holds
the FunctionData reference, and runs iterativeVisit until the
stack drains.
A processed node is not popped immediately after splitIntoSCCs
pushes its children: it sets a clearAllStorage sentinel and stays
on the stack. Because the stack is LIFO, the node only becomes the
top again after every descendant has been drained -- at which point
its child storage can be safely freed without invalidating any
parentNode or blockToGroupMap pointer. This keeps live memory
proportional to the current root-to-leaf path, mirroring what the
recursive version got from the call stack.
splitIntoSCCs is also tightened to avoid push/pop churn: in the N=1
case no extra "self-as-child" node is created (self-loops are
evaluated naturally), and in the N>1 case the final SCC (the one
that contains start) is intentionally never pushed, since the
caller runs start inline immediately after the split.
The isLoopCopy flag replaces the implicit childrenNodes.size()==1
check that previously identified the loop replica. BBProgress is
promoted from a bool& plumbed through the visit helpers to a member
of the node. The redundant subsets / subsetIndex parallel data
structures are removed; cleanUp now looks the owning child up
through blockToGroupMap directly.
Initialize BasicBlockNode iteration state, avoid accidental DenseMap insertion during cleanup, and assert that the delayed SCC split drops only the start component. Also remove a redundant storage lookup and fix stale comments and whitespace around the iterative SCC visitor.
Replace the node-owned cleanup sentinel with an explicit stack frame phase. This keeps the non-recursive scheduling order unchanged while making the parent-storage cleanup step part of the stack protocol instead of mutable node state.
3ca4703 to
af7a48d
Compare
Restructured the algorithm to visit SCCs iteratively using
an explicit stack instead of recursion.
Additionally, blocksStorage now replaces childrenNodes. This
change preserves the BBGNs because they can still be referenced
after being popped from the stack. Instead of each child maintaining
its own local copy of its blocks, the storage is now managed globally.
Also added a isloopCopy flag to translate the BBProgress optimization.