Pipelining NEORV32 #1298
Replies: 6 comments 29 replies
-
|
I really like to cooperate with you in this case if you want. |
Beta Was this translation helpful? Give feedback.
-
|
Hi there! The task of pipeline the CPU is not trivial. In my opinion, NEORV32 is perfect for learning how RISC-V microcontrollers work and with a single thread everything operates in a relatively "simple" way to understand it. By taking a look to the asm sw you can view clearly how this scrip is executing in the core step by step, instruction by instruction one after the other. However, for high-performance contexts, Stephan has added the possibility to implement several cores. Likely, in my opinion, this is the best choice because it keeps the core simple for those who want to learn and also provides an option for those who need a more powerful setup to solve demanding problems. In addition to that, if you need to deal with a truly high demand problem, you should to consider using a 64-bit or even 128-bit RISC-V pipelined implementation. In conclusion, I would maintain the single thread version, although I wouldn’t rule out a parallel pipelined version. 🤔 Just my opinion. 😃 Cheers! |
Beta Was this translation helpful? Give feedback.
-
|
Pipelining is actually at the top of my to-do list. 😅 I've thought about a lot of concepts and have already tried out some of them in hardware. The problem is actually the additional hardware overhead. This is manageable for the actual pipeline. In the end, you only need forwarding and a kind of ready-valid handshake between the stages. But then there are all the traps that can occur in different stages (illegal instruction in DECODE, bus access error in MEMORY ACCESS, privilege mode error in WRITE BACK, ...). It gets even worse when there are multi-cycle operations (division, or a bus access with wait states). These must be synchronized across all (previous) stages to halt everything until the operation is completed. This is all feasible (and almost standard nowadays), but costs additional hardware. And at some point we end up with Ibex, VexRISC and the like with their more classic DLX pipelines. In addition, the bus interface would have to be adapted, as it currently needs at least 2 clocks to answer a request (ok, we now also have bursts...). Unfortunately, I can't give you any concrete figures, but I would say from a feeling that full pipelining would make the CPU about twice as big. But then the dual-core configuration is interesting again. Since the CPUs need several clocks per instruction anyway, they surprisingly don't get in each other's way much when it comes to bus accesses and you get roughly +100% performance (I made a very simple test for that: https://github.com/stnolting/neorv32/tree/main/sw/example/demo_dual_core_primes). TL;DRSo yeah, pipelining is something I would find very cool. But due to the high hardware costs, I would only limit it to some parts of the CPU. E.g. faster jumps would be a real performance boost. Maybe there are things that can be improved? 🤔 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Hi See result: *)Notes: |
Beta Was this translation helpful? Give feedback.
-
|
Hi Maybe it's due to time-multiplexed comparator of PMP module for LSU and instruction accesses? (It assumes instructions are multicycle) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
Have you ever tried to pipeline NEORV32 CPU?
Although the area overhead is higher in this case, pipelining makes this project suitable for high performance applications (esp. for FP computations).
Beta Was this translation helpful? Give feedback.
All reactions