Skip to content

feat(tools): HuggingFace config parser for transformer spec#193

Open
pradhyum6144 wants to merge 1 commit intomodelpack:mainfrom
pradhyum6144:feat/hf-config-parser
Open

feat(tools): HuggingFace config parser for transformer spec#193
pradhyum6144 wants to merge 1 commit intomodelpack:mainfrom
pradhyum6144:feat/hf-config-parser

Conversation

@pradhyum6144
Copy link
Contributor

@pradhyum6144 pradhyum6144 commented Mar 22, 2026

Summary

Features

Feature Details
Attention types MHA, GQA, MLA
FFN types MLP (gated/ungated), MoE (with shared experts)
Normalization RMSNorm, LayerNorm
Position embeddings RoPE (with theta/scaling), learned
MLA fields kv_lora_rank, q_lora_rank, qk_nope_head_dim, qk_rope_head_dim, v_head_dim
MoE routing routed_scaling_factor, topk_method, norm_topk_prob
Mixed layers first_k_dense_replace, moe_layer_freq
Unknown models Fields marked NEEDS_REVIEW for human verification

Usage

# From HuggingFace model ID
python tools/hf_parser.py meta-llama/Meta-Llama-3-8B

# From local config.json
python tools/hf_parser.py --file path/to/config.json

# JSON output
python tools/hf_parser.py mistralai/Mistral-7B-v0.3 --format json

Test plan

  • 26 unit tests pass (pytest tools/hf_parser_test.py -v)
  • Tested against real HF configs: Mistral-7B, Mixtral-8x7B, Qwen2-7B, GPT-2, DeepSeek-V2-Lite
  • Unknown model types produce NEEDS_REVIEW markers instead of crashing
  • Head dim correctly derived when not explicit in config

Relates to #164

…modelpack#164)

Adds hf_parser.py that converts HuggingFace config.json into ModelPack
transformer spec format (PR modelpack#111 vocabulary). Supports Mistral, Mixtral,
Qwen2, GPT-2, DeepSeek-V2 (MLA + mixed layers), and unknown models with
NEEDS_REVIEW fallback. Includes 26 unit tests.

Improvements over PR modelpack#185's field mapping research:
- MLA attention fields (kv_lora_rank, q_lora_rank, qk_nope/rope_head_dim)
- DeepSeek MoE routing params (routed_scaling_factor, topk_method)
- Mixed layers support (first_k_dense_replace, moe_layer_freq)
- Correct learned position embedding for GPT-2/GPT-Neo

Signed-off-by: pradhyum6144 <pradhyum314@gmail.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a crucial utility for standardizing the representation of HuggingFace transformer model configurations. By providing a robust parser that translates diverse config.json structures into a unified ModelPack specification, it significantly streamlines the process of integrating and analyzing various large language models, ensuring consistency and reducing manual effort in model definition.

Highlights

  • HuggingFace Config Parser: Introduced tools/hf_parser.py to convert HuggingFace config.json files into the ModelPack transformer specification format, aligning with PR [WIP] feat: add model architecture configuration #111's vocabulary.
  • Broad Model Support: Implemented support for six distinct model families, including Mistral (GQA), Mixtral (MoE), Qwen2 (attention bias), GPT-2 (MHA/LayerNorm), and DeepSeek-V2 (MLA + mixed layers), with a NEEDS_REVIEW fallback for unknown models.
  • Comprehensive Testing: Included tools/hf_parser_test.py with 26 unit tests to ensure accurate parsing across all supported architectures and edge cases.
  • Advanced Field Parsing: Enhanced parsing capabilities to correctly extract MLA attention fields (kv_lora_rank, q_lora_rank, qk_nope_head_dim), DeepSeek MoE routing parameters, and mixed layer detection, building on prior research.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable tool for converting HuggingFace model configurations into the ModelPack transformer specification format. The implementation is well-structured and covers a good range of model architectures and their specific parameters. The inclusion of a comprehensive test suite is also a great practice. I've identified a bug in the YAML output generation that could lead to invalid output, a minor documentation inconsistency, and an opportunity for a small refactoring to improve maintainability. Overall, this is a solid contribution.

if value == NEEDS_REVIEW:
lines.append(f"{prefix}{key}: {value} # requires human review")
else:
lines.append(f'{prefix}{key}: "{value}"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current string formatting for YAML output does not escape special characters like double quotes within the string value. This can result in invalid YAML. For example, a value of a "b" c would be rendered as key: "a "b" c", which is not valid. You can fix this by using json.dumps() which correctly handles string escaping for JSON, and the output is also valid for YAML.

Suggested change
lines.append(f'{prefix}{key}: "{value}"')
lines.append(f'{prefix}{key}: {json.dumps(value)}')

Usage:
python tools/hf_parser.py meta-llama/Meta-Llama-3-8B
python tools/hf_parser.py mistralai/Mistral-7B-v0.3
python tools/hf_parser.py --file path/to/config.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The usage example in the docstring mentions a --file flag, but the script's argument parser is implemented to accept a positional argument for both model IDs and file paths. This example should be updated to reflect the actual implementation for clarity.

Suggested change
python tools/hf_parser.py --file path/to/config.json
python tools/hf_parser.py path/to/config.json

Comment on lines +188 to +191
use_gated = model_type in (
"llama", "mistral", "mixtral", "qwen2", "qwen2_moe", "phi3",
"gemma", "gemma2", "deepseek_v2", "deepseek_v3",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For checking membership, using a set is more idiomatic and performant than a tuple, especially as the list of models grows. I suggest converting this tuple to a set. For even better maintainability, you could define this as a module-level constant.

Suggested change
use_gated = model_type in (
"llama", "mistral", "mixtral", "qwen2", "qwen2_moe", "phi3",
"gemma", "gemma2", "deepseek_v2", "deepseek_v3",
)
use_gated = model_type in {
"llama", "mistral", "mixtral", "qwen2", "qwen2_moe", "phi3",
"gemma", "gemma2", "deepseek_v2", "deepseek_v3",
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant