https://huggingface.co/fxmeng/TransMLA-llama3-8b-8k/
Hi, Thanks for the amazing work. I see on the Huggingface, there is a model release with TransMLA. Could you clarify how to reproduce the conversion from Meta-Llama3-8B to TransMLA-llama3-8b-8k? What's the training data used? In particular, the experimental setup from the paper seems to focus primarily on smolLM 1.7B and Llama 2 7B, with little mention of Llama3 8B.
https://huggingface.co/fxmeng/TransMLA-llama3-8b-8k/
Hi, Thanks for the amazing work. I see on the Huggingface, there is a model release with TransMLA. Could you clarify how to reproduce the conversion from Meta-Llama3-8B to TransMLA-llama3-8b-8k? What's the training data used? In particular, the experimental setup from the paper seems to focus primarily on smolLM 1.7B and Llama 2 7B, with little mention of Llama3 8B.