docs: Add HuggingFace field mapping for architecture spec#185
docs: Add HuggingFace field mapping for architecture spec#185Liuck27 wants to merge 1 commit intomodelpack:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive mapping document that translates HuggingFace Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is an excellent and thorough analysis mapping HuggingFace config.json fields to the ModelPack architecture specification. The document is well-structured, detailed, and does a great job of identifying inconsistencies between model families and gaps in the current ModelPack spec, particularly regarding the complexities of MLA in DeepSeek-V2-Lite. I have one suggestion to improve the clarity and accuracy of the 'Key Observations' summary section.
fc60c95 to
856aabd
Compare
Map HuggingFace config.json fields to the ModelPack architecture vocabulary defined in PR modelpack#111 (docs/architecture.md) across four model families: Llama 3.1 8B, Mistral 7B v0.3, Mixtral 8x7B, and DeepSeek-V2-Lite. The document classifies each field mapping as direct, renamed, derived, inferred, or model-specific, and includes the raw config.json values for traceability. Key findings: - Core fields (hidden_size, num_attention_heads, num_key_value_heads) are consistent across all four models - Several spec fields (use_gated_activation, is_causal, norm type) cannot be determined from config.json alone and require model family knowledge - DeepSeek MLA attention uses a fundamentally different field set (q_lora_rank, kv_lora_rank) that does not map cleanly to the current spec vocabulary - sliding_window has no ModelPack equivalent (spec gap) This mapping is intended to guide the auto-generation tooling and JSON Schema work planned for issue modelpack#164. Ref: modelpack#164 Signed-off-by: Luca Secchieri <luca.secchieri@gmail.com>
856aabd to
6387a36
Compare
…modelpack#164) Adds hf_parser.py that converts HuggingFace config.json into ModelPack transformer spec format (PR modelpack#111 vocabulary). Supports Mistral, Mixtral, Qwen2, GPT-2, DeepSeek-V2 (MLA + mixed layers), and unknown models with NEEDS_REVIEW fallback. Includes 26 unit tests. Improvements over PR modelpack#185's field mapping research: - MLA attention fields (kv_lora_rank, q_lora_rank, qk_nope/rope_head_dim) - DeepSeek MoE routing params (routed_scaling_factor, topk_method) - Mixed layers support (first_k_dense_replace, moe_layer_freq) - Correct learned position embedding for GPT-2/GPT-Neo Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
Summary
Maps HuggingFace
config.jsonfields to the ModelPack architecture vocabularydefined in PR #111 (
docs/architecture.md) across four model families:Each mapping is classified as direct, renamed, derived, inferred, or model-specific,
with raw
config.jsonvalues included for traceability.Open questions for reviewers
has_qkv_biasvshas_output_bias: HuggingFace uses a singleattention_biasflag covering all projections (Q/K/V/O). The ModelPack spec separates these into
two fields. Should the spec maintain this distinction even when most HF models
don't differentiate?
sliding_windowgap: Mistral 7B v0.1 usessliding_window: 4096, but thearchitecture spec has no field for this attention pattern. Flagged in the document.
DeepSeek MLA fields: MLA uses
q_lora_rank,kv_lora_rank,qk_rope_head_dim,qk_nope_head_dim,v_head_diminstead of the standardnum_key_value_heads/head_dimpattern. These don't map cleanly to the current spec vocabulary. Worthdiscussing how the spec should handle MLA's compressed KV representation.
Inferred fields: Several ModelPack fields (
use_gated_activation,is_causal,attention.type, norm type) cannot be determined fromconfig.jsonalone andrequire model family knowledge. This has implications for auto-generation tooling.
Ref: #164
Builds on: #111