Skip to content

docs: Add HuggingFace field mapping for architecture spec#185

Open
Liuck27 wants to merge 1 commit intomodelpack:mainfrom
Liuck27:hf-field-mapping
Open

docs: Add HuggingFace field mapping for architecture spec#185
Liuck27 wants to merge 1 commit intomodelpack:mainfrom
Liuck27:hf-field-mapping

Conversation

@Liuck27
Copy link

@Liuck27 Liuck27 commented Mar 14, 2026

Summary

Maps HuggingFace config.json fields to the ModelPack architecture vocabulary
defined in PR #111 (docs/architecture.md) across four model families:

  • Llama 3.1 8B — GQA, dense MLP, RoPE
  • Mistral 7B v0.3 — GQA, sliding window (disabled), RoPE
  • Mixtral 8x7B — GQA, MoE (8 experts, top-2 routing)
  • DeepSeek-V2-Lite — MLA attention, MoE with shared experts, mixed layers

Each mapping is classified as direct, renamed, derived, inferred, or model-specific,
with raw config.json values included for traceability.

Open questions for reviewers

  1. has_qkv_bias vs has_output_bias: HuggingFace uses a single attention_bias
    flag covering all projections (Q/K/V/O). The ModelPack spec separates these into
    two fields. Should the spec maintain this distinction even when most HF models
    don't differentiate?

  2. sliding_window gap: Mistral 7B v0.1 uses sliding_window: 4096, but the
    architecture spec has no field for this attention pattern. Flagged in the document.

  3. DeepSeek MLA fields: MLA uses q_lora_rank, kv_lora_rank, qk_rope_head_dim,
    qk_nope_head_dim, v_head_dim instead of the standard num_key_value_heads /
    head_dim pattern. These don't map cleanly to the current spec vocabulary. Worth
    discussing how the spec should handle MLA's compressed KV representation.

  4. Inferred fields: Several ModelPack fields (use_gated_activation, is_causal,
    attention.type, norm type) cannot be determined from config.json alone and
    require model family knowledge. This has implications for auto-generation tooling.

Ref: #164
Builds on: #111

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive mapping document that translates HuggingFace config.json fields into the ModelPack architecture specification. This effort is crucial for validating the ModelPack vocabulary against real-world models, guiding the development of automated tooling, and clearly documenting where fields require direct mapping, renaming, derivation, or inference. The analysis covers four diverse model architectures, highlighting both consistencies and inconsistencies in field definitions across different model families, and identifies areas where the ModelPack spec may need further refinement.

Highlights

  • HuggingFace Field Mapping: Mapped HuggingFace config.json fields to the ModelPack architecture vocabulary, providing a comprehensive translation guide.
  • Model Family Analysis: Analyzed four distinct model families (Llama 3.1 8B, Mistral 7B v0.3, Mixtral 8x7B, and DeepSeek-V2-Lite) to validate the ModelPack architecture vocabulary against real-world configurations.
  • Mapping Type Classification: Categorized field mappings as direct, renamed, derived, inferred, or model-specific, enhancing clarity and guiding future tooling development.
  • Traceability and Reference: Included raw config.json values for each analyzed model to ensure full traceability and provide immediate reference.
Changelog
  • docs/hf-field-mapping.md
    • Added a new document detailing the mapping of HuggingFace config.json fields to the ModelPack architecture specification.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and thorough analysis mapping HuggingFace config.json fields to the ModelPack architecture specification. The document is well-structured, detailed, and does a great job of identifying inconsistencies between model families and gaps in the current ModelPack spec, particularly regarding the complexities of MLA in DeepSeek-V2-Lite. I have one suggestion to improve the clarity and accuracy of the 'Key Observations' summary section.

Map HuggingFace config.json fields to the ModelPack architecture
vocabulary defined in PR modelpack#111 (docs/architecture.md) across four
model families: Llama 3.1 8B, Mistral 7B v0.3, Mixtral 8x7B, and
DeepSeek-V2-Lite.

The document classifies each field mapping as direct, renamed,
derived, inferred, or model-specific, and includes the raw
config.json values for traceability. Key findings:

- Core fields (hidden_size, num_attention_heads, num_key_value_heads)
  are consistent across all four models
- Several spec fields (use_gated_activation, is_causal, norm type)
  cannot be determined from config.json alone and require model
  family knowledge
- DeepSeek MLA attention uses a fundamentally different field set
  (q_lora_rank, kv_lora_rank) that does not map cleanly to the
  current spec vocabulary
- sliding_window has no ModelPack equivalent (spec gap)

This mapping is intended to guide the auto-generation tooling and
JSON Schema work planned for issue modelpack#164.

Ref: modelpack#164

Signed-off-by: Luca Secchieri <luca.secchieri@gmail.com>
pradhyum6144 added a commit to pradhyum6144/model-spec that referenced this pull request Mar 22, 2026
…modelpack#164)

Adds hf_parser.py that converts HuggingFace config.json into ModelPack
transformer spec format (PR modelpack#111 vocabulary). Supports Mistral, Mixtral,
Qwen2, GPT-2, DeepSeek-V2 (MLA + mixed layers), and unknown models with
NEEDS_REVIEW fallback. Includes 26 unit tests.

Improvements over PR modelpack#185's field mapping research:
- MLA attention fields (kv_lora_rank, q_lora_rank, qk_nope/rope_head_dim)
- DeepSeek MoE routing params (routed_scaling_factor, topk_method)
- Mixed layers support (first_k_dense_replace, moe_layer_freq)
- Correct learned position embedding for GPT-2/GPT-Neo

Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant