Skip to content

Fix deadlock condition by disabling prefetches on invalid hits #36

Open
colluca wants to merge 2 commits into
mainfrom
filter-prefetch
Open

Fix deadlock condition by disabling prefetches on invalid hits #36
colluca wants to merge 2 commits into
mainfrom
filter-prefetch

Conversation

@colluca
Copy link
Copy Markdown
Collaborator

@colluca colluca commented May 7, 2026

Consider the following scenario, encountered in the Snitch cluster.
All cores reach a barrier except core 0. One would expect all instruction frontends to be idle, except core 0’s, but this is not the case. The L0 cache produces prefetching requests for any address on its request interface, even if valid is low. For some condition I did not have time to fully investigate, but related to a multi-hit scenario, the prefetched cache lines would be flushed immediately after being written back into the cache. Thus, even if the address remains stable on the request interface, the associated prefetching request would not be filtered by the availability of that cache line in the cache, and the same cache line would be repeatedly prefetched over and over again, without end. This causes the L1 cache to be continuously bombarded with prefetching requests from the L0 caches.

Simultaneously, core 0 is doing some useful work, but misses in the L0 cache, causing a refill to be requested from the L1 cache. As prefetch requests have fixed priority over miss requests at the L1 cache, and the other 8 cores are bombarding the L1 cache with prefetching requests, the miss from core 0 never gets served, stalling core 0. As all other cores are waiting on a barrier for core 0, the cycle is closed, and the system deadlocks.

There are several things that one could consider faulty in this whole operation, and which one could fix to solve this specific deadlock condition. For simplicity, I chose to correct the prefetch request filtering logic. I don't believe it's good design to process invalid requests, and the expected behaviour would be that the L0 is idle when the core frontend is idle. It, perhaps accidentally, solves the deadlock since, when the other cores reach the barrier and their frontends become idle, core 0 can resume its execution. A more solid fix would probably also tackle another issue the arbitration priority at the L1 cache, but I don't personally have time to look into this properly, but I opened an issue to track this #37.

This PR also cleans up an outdated comment after removing non-resettable FFs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant