docs: Add HuggingFace field mapping for architecture spec by Liuck27 · Pull Request #185 · modelpack/model-spec

Liuck27 · 2026-03-14T16:58:56Z

Summary

Maps HuggingFace config.json fields to the ModelPack architecture vocabulary
defined in PR #111 (docs/architecture.md) across four model families:

Llama 3.1 8B — GQA, dense MLP, RoPE
Mistral 7B v0.3 — GQA, sliding window (disabled), RoPE
Mixtral 8x7B — GQA, MoE (8 experts, top-2 routing)
DeepSeek-V2-Lite — MLA attention, MoE with shared experts, mixed layers

Each mapping is classified as direct, renamed, derived, inferred, or model-specific,
with raw config.json values included for traceability.

Open questions for reviewers

has_qkv_bias vs has_output_bias: HuggingFace uses a single attention_bias
flag covering all projections (Q/K/V/O). The ModelPack spec separates these into
two fields. Should the spec maintain this distinction even when most HF models
don't differentiate?
sliding_window gap: Mistral 7B v0.1 uses sliding_window: 4096, but the
architecture spec has no field for this attention pattern. Flagged in the document.
DeepSeek MLA fields: MLA uses q_lora_rank, kv_lora_rank, qk_rope_head_dim,
qk_nope_head_dim, v_head_dim instead of the standard num_key_value_heads /
head_dim pattern. These don't map cleanly to the current spec vocabulary. Worth
discussing how the spec should handle MLA's compressed KV representation.
Inferred fields: Several ModelPack fields (use_gated_activation, is_causal,
attention.type, norm type) cannot be determined from config.json alone and
require model family knowledge. This has implications for auto-generation tooling.

Ref: #164
Builds on: #111

gemini-code-assist · 2026-03-14T16:59:19Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive mapping document that translates HuggingFace config.json fields into the ModelPack architecture specification. This effort is crucial for validating the ModelPack vocabulary against real-world models, guiding the development of automated tooling, and clearly documenting where fields require direct mapping, renaming, derivation, or inference. The analysis covers four diverse model architectures, highlighting both consistencies and inconsistencies in field definitions across different model families, and identifies areas where the ModelPack spec may need further refinement.

Highlights

HuggingFace Field Mapping: Mapped HuggingFace config.json fields to the ModelPack architecture vocabulary, providing a comprehensive translation guide.
Model Family Analysis: Analyzed four distinct model families (Llama 3.1 8B, Mistral 7B v0.3, Mixtral 8x7B, and DeepSeek-V2-Lite) to validate the ModelPack architecture vocabulary against real-world configurations.
Mapping Type Classification: Categorized field mappings as direct, renamed, derived, inferred, or model-specific, enhancing clarity and guiding future tooling development.
Traceability and Reference: Included raw config.json values for each analyzed model to ensure full traceability and provide immediate reference.

Changelog

docs/hf-field-mapping.md
- Added a new document detailing the mapping of HuggingFace config.json fields to the ModelPack architecture specification.

Activity

The author initiated discussion by posing several open questions to reviewers regarding specific field distinctions (e.g., has_qkv_bias vs has_output_bias), identified gaps in the ModelPack specification (e.g., sliding_window, MLA fields), and highlighted challenges with inferred fields.
The pull request builds upon PR [WIP] feat: add model architecture configuration #111 and references issue Transformer specification and auto-generation method for the existing models #164, indicating ongoing work and related discussions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an excellent and thorough analysis mapping HuggingFace config.json fields to the ModelPack architecture specification. The document is well-structured, detailed, and does a great job of identifying inconsistencies between model families and gaps in the current ModelPack spec, particularly regarding the complexities of MLA in DeepSeek-V2-Lite. I have one suggestion to improve the clarity and accuracy of the 'Key Observations' summary section.

Map HuggingFace config.json fields to the ModelPack architecture vocabulary defined in PR modelpack#111 (docs/architecture.md) across four model families: Llama 3.1 8B, Mistral 7B v0.3, Mixtral 8x7B, and DeepSeek-V2-Lite. The document classifies each field mapping as direct, renamed, derived, inferred, or model-specific, and includes the raw config.json values for traceability. Key findings: - Core fields (hidden_size, num_attention_heads, num_key_value_heads) are consistent across all four models - Several spec fields (use_gated_activation, is_causal, norm type) cannot be determined from config.json alone and require model family knowledge - DeepSeek MLA attention uses a fundamentally different field set (q_lora_rank, kv_lora_rank) that does not map cleanly to the current spec vocabulary - sliding_window has no ModelPack equivalent (spec gap) This mapping is intended to guide the auto-generation tooling and JSON Schema work planned for issue modelpack#164. Ref: modelpack#164 Signed-off-by: Luca Secchieri <luca.secchieri@gmail.com>

…modelpack#164) Adds hf_parser.py that converts HuggingFace config.json into ModelPack transformer spec format (PR modelpack#111 vocabulary). Supports Mistral, Mixtral, Qwen2, GPT-2, DeepSeek-V2 (MLA + mixed layers), and unknown models with NEEDS_REVIEW fallback. Includes 26 unit tests. Improvements over PR modelpack#185's field mapping research: - MLA attention fields (kv_lora_rank, q_lora_rank, qk_nope/rope_head_dim) - DeepSeek MoE routing params (routed_scaling_factor, topk_method) - Mixed layers support (first_k_dense_replace, moe_layer_freq) - Correct learned position embedding for GPT-2/GPT-Neo Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>

bergwolf · 2026-04-28T05:18:15Z

Qq, what is the purpose of the mapping? Are you planning to add modctl support that implement these mapping?

gemini-code-assist Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread docs/hf-field-mapping.md Outdated

Liuck27 force-pushed the hf-field-mapping branch from fc60c95 to 856aabd Compare March 15, 2026 14:22

Liuck27 force-pushed the hf-field-mapping branch from 856aabd to 6387a36 Compare March 15, 2026 14:37

pradhyum6144 mentioned this pull request Mar 22, 2026

feat(tools): HuggingFace config parser for transformer spec #193

Open

4 tasks

bergwolf added the documentation Improvements or additions to documentation label Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add HuggingFace field mapping for architecture spec#185

docs: Add HuggingFace field mapping for architecture spec#185
Liuck27 wants to merge 1 commit into
modelpack:mainfrom
Liuck27:hf-field-mapping

Liuck27 commented Mar 14, 2026

Uh oh!

gemini-code-assist Bot commented Mar 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

bergwolf commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Liuck27 commented Mar 14, 2026

Summary

Open questions for reviewers

Uh oh!

gemini-code-assist Bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

bergwolf commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants