Skip to content

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9

Open
hallerite wants to merge 1 commit intomainfrom
feat/llama-3-renderer
Open

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9
hallerite wants to merge 1 commit intomainfrom
feat/llama-3-renderer

Conversation

@hallerite
Copy link
Copy Markdown
Member

Summary

Hand-coded Llama3Renderer for Meta's Llama-3.x chat template, plus matching parse_llama_3 parser. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (auto-routed via MODEL_RENDERER_MAP). No version bump.

How tests work without a Meta-license HF token

MODEL_RENDERER_MAP registers the canonical meta-llama/... paths so production callers auto-route. Tests load the tokenizer via the unrestricted unsloth/Llama-3.2-{1B,3B}-Instruct mirror — the chat-template SHA matches Meta's bit-for-bit and the underlying tiktoken-BPE files are identical. CI doesn't need an HF_TOKEN with Meta license access.

Implementation notes

  • No <think> / reasoning channel — Llama-3 doesn't ship one. preserve_*_thinking constructor flags raise NotImplementedError if set (matches DefaultRenderer's contract for the same case).
  • <|begin_of_text|> (BOS) is emitted at the start of every render; system block is always emitted with the fixed Cutting Knowledge Date / Today Date preamble even when no system message is supplied.
  • date_string is a constructor kwarg, defaulting to "26 Jul 2024" (the chat template's strftime fallback) so output stays deterministic. Override per-instance for production runs that want today's date.
  • tools_in_user_message defaults to True (matches chat template). Tools + JSON signatures inject into the first user message; pass False to flip to system-block mode. Both modes parity-tested.
  • Single tool call per assistant message (chat template raises otherwise). Tool calls render as a JSON blob {"name": "...", "parameters": ...} inside the assistant body. Tool responses render under role ipython regardless of source role; mirrors the chat template's content | tojson branch — including the Jinja quirk that strings are iterable, so plain-string tool content gets JSON-quoted.
  • parse_llama_3 detects the JSON tool-call body shape with a strict starts-with-{ + parses-as-dict-with-name check; malformed JSON falls through to content rather than dropping silently.

Tests

47 dedicated tests in tests/test_llama_3.py:

  • MODEL_RENDERER_MAP shape + factory routing
  • Constructor contract (default date, preserve_*_thinking rejection, tools_in_user_message toggle)
  • Byte parity vs apply_chat_template across 11 conversation shapes (system + user, user-only, multi-turn, gen prompt, whitespace trimming, custom date, tools-in-user, tools-in-system, tool call round-trip, dict tool response, multiple-tool-calls rejection)
  • parse_response (plain, tool call, malformed JSON fallthrough)
  • Bridge contract (extends prev verbatim, matches fresh render, rejects assistant in extension, synthesises close on truncation)

Test plan

  • pytest tests/test_llama_3.py — 47 cases pass on both 1B and 3B mirrors
  • Full suite (pytest tests/ --ignore=tests/test_client.py) — 947 pass, 48 skipped, 1 xfailed (no regressions)
  • Pre-commit hooks (ruff check + format) clean
  • Maintainer with Meta-license HF_TOKEN can verify meta-llama/Llama-3.2-1B-Instruct parity directly (the unsloth mirror has been bit-verified, but a once-off canonical run is good defense in depth)

🤖 Generated with Claude Code

Hand-coded Llama3Renderer mirroring Meta's Llama-3.x chat template.
Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (and the
unrestricted unsloth/... mirrors with byte-identical chat templates).
MODEL_RENDERER_MAP routes the canonical meta-llama paths; tests load
via the unsloth mirrors so CI doesn't need an HF_TOKEN with Meta
license access.

Implementation notes:

* No <think> / reasoning channel — preserve_*_thinking constructor
  flags raise NotImplementedError if set (matches DefaultRenderer's
  contract for the same case).

* <|begin_of_text|> (BOS) is emitted at the start of every render. The
  system block is emitted UNCONDITIONALLY with a fixed
  "Cutting Knowledge Date / Today Date" preamble even when no system
  message is supplied. date_string is a constructor kwarg pinned at
  "26 Jul 2024" by default (matches the chat template's strftime
  fallback); override per instance for production runs that want
  today's date.

* tools_in_user_message defaults to True. Tools + JSON signatures
  inject into the first user message; pass False at construction to
  flip to system-block mode. Both modes parity-tested.

* Single tool call per assistant message (chat template raises
  otherwise). Tool calls render as a JSON blob inside the assistant
  body. Tool responses render under role ipython regardless of source
  role; mirrors the chat template's content|tojson branch including
  the Jinja quirk that strings are iterable so plain-string tool
  content gets JSON-quoted.

* parse_llama_3 detects the JSON tool-call body shape with a strict
  check; malformed JSON falls through to content.

47 dedicated tests covering map shape, constructor contract, byte
parity across 11 conversation shapes (including tool calls, multi-turn,
custom date, tools-in-system mode), parse_response, and bridge
contract. Full suite: 947 passed, 48 skipped, 1 xfailed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant