Starting from this conversation, there are discussions on the possibility of turning existing MoE models into BitNet models. microsoft/BitNet#234
Considering people are speed-running NanoGPT, maybe BitNet can also make things faster? https://github.com/KellerJordan/modded-nanogpt/