[refactor] Simplify and unify the TornadoVM layer planner infrastructure#101
Open
orionpapadakis wants to merge 13 commits intomainfrom
Open
[refactor] Simplify and unify the TornadoVM layer planner infrastructure#101orionpapadakis wants to merge 13 commits intomainfrom
orionpapadakis wants to merge 13 commits intomainfrom
Conversation
…s get/set methods to AbstractFFNLayers for consistency
…abstractify it for visibility and consistency across all FFN layers
…gic across all subclasses
…e plan creation logic in layerplanner package
bba5fec to
3fffabd
Compare
…ecific subpackages
…ared logic into AbstractLogitsLayer
3fffabd to
a3f1450
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a structural refactoring of the tornadovm package — no behavior changes, no new model capabilities. The goal is to eliminate duplication across the FFN layer and layer planner hierarchies and establish a cleaner, more maintainable package structure.
Changes:
FFN layer hierarchy (
tornadovm/layers/)AbstractLogitsLayerto centralize shared setup logic for all logits layers (LogitsFP16Layer,LogitsQ8_0Layer, andGranitevariants), replacing duplicated task graph andscheduler boilerplate
AbstractFFNLayers<W, C>with type parameters forWeightsandConfiguration, giving typed access to weights and config in subclassessetupFFNLayerTaskGraphs()across all FFN subclasses; removed redundant fields and methods surfaced during the cleanupMistralFP16FFNLayersandMistralQ8_0FFNLayers— Mistral-specific FFN layer implementations usingMistralConfiguration, required after the generics tighteningLayer planner hierarchy (
tornadovm/layerplanner/)createTornadoInferencePlan(), all shared fields (activationLayer,ffnLayers,logitsLayer,scheduler,task graphs), and theGenericLayerPlannerinterface implementationsinto
QuantizedLayerPlanner, eliminating near-total duplication betweenFP16LayerPlannerandQ8_0LayerPlannerMistralFP16LayerPlannerandMistralQ8_0LayerPlannerwith correct generic types (LlamaState,MistralConfiguration,LlamaTornadoWeights), fixing aClassCastExceptioncaused byMistral being incorrectly routed to the Llama planners
Package reorganization
GenericLayerPlannerandQuantizationPlannerFactorymoved tolayerplanner/root (were in abase/subpackage)QuantizedLayerPlannermoved tolayerplanner/rootFP16LayerPlannerco-located with FP16 concrete planners inmodel/fp16/Q8_0LayerPlannerco-located with Q8_0 concrete planners inmodel/q8_0/DeepSeek-R1-Distill-Qwen fix
DeepSeekR1Qwenmodel class (extends Qwen2) that correctly overridesgetModelType()andshouldAddBeginOfText(), fixing a repetition loop caused by missing BOS token and<think>prefix injectionQwen3ChatFormat.getBeginOfText()to fall back tostartHeader(<|begin▁of▁sentence|>) when no BOS alias is registered — DeepSeek reuses its first role-marker token as BOSQwen3Tokenizer.encode()to byte-map non-special text parts before BPE encoding, resolving aNoSuchElementExceptionwhen encoding"\n"after splitting on special tokens like<think>