feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9
Open
feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9
Conversation
Hand-coded Llama3Renderer mirroring Meta's Llama-3.x chat template. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (and the unrestricted unsloth/... mirrors with byte-identical chat templates). MODEL_RENDERER_MAP routes the canonical meta-llama paths; tests load via the unsloth mirrors so CI doesn't need an HF_TOKEN with Meta license access. Implementation notes: * No <think> / reasoning channel — preserve_*_thinking constructor flags raise NotImplementedError if set (matches DefaultRenderer's contract for the same case). * <|begin_of_text|> (BOS) is emitted at the start of every render. The system block is emitted UNCONDITIONALLY with a fixed "Cutting Knowledge Date / Today Date" preamble even when no system message is supplied. date_string is a constructor kwarg pinned at "26 Jul 2024" by default (matches the chat template's strftime fallback); override per instance for production runs that want today's date. * tools_in_user_message defaults to True. Tools + JSON signatures inject into the first user message; pass False at construction to flip to system-block mode. Both modes parity-tested. * Single tool call per assistant message (chat template raises otherwise). Tool calls render as a JSON blob inside the assistant body. Tool responses render under role ipython regardless of source role; mirrors the chat template's content|tojson branch including the Jinja quirk that strings are iterable so plain-string tool content gets JSON-quoted. * parse_llama_3 detects the JSON tool-call body shape with a strict check; malformed JSON falls through to content. 47 dedicated tests covering map shape, constructor contract, byte parity across 11 conversation shapes (including tool calls, multi-turn, custom date, tools-in-system mode), parse_response, and bridge contract. Full suite: 947 passed, 48 skipped, 1 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hand-coded
Llama3Rendererfor Meta's Llama-3.x chat template, plus matchingparse_llama_3parser. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (auto-routed viaMODEL_RENDERER_MAP). No version bump.How tests work without a Meta-license HF token
MODEL_RENDERER_MAPregisters the canonicalmeta-llama/...paths so production callers auto-route. Tests load the tokenizer via the unrestrictedunsloth/Llama-3.2-{1B,3B}-Instructmirror — the chat-template SHA matches Meta's bit-for-bit and the underlying tiktoken-BPE files are identical. CI doesn't need an HF_TOKEN with Meta license access.Implementation notes
<think>/ reasoning channel — Llama-3 doesn't ship one.preserve_*_thinkingconstructor flags raiseNotImplementedErrorif set (matchesDefaultRenderer's contract for the same case).<|begin_of_text|>(BOS) is emitted at the start of every render; system block is always emitted with the fixedCutting Knowledge Date / Today Datepreamble even when no system message is supplied.date_stringis a constructor kwarg, defaulting to"26 Jul 2024"(the chat template'sstrftimefallback) so output stays deterministic. Override per-instance for production runs that want today's date.tools_in_user_messagedefaults toTrue(matches chat template). Tools + JSON signatures inject into the first user message; passFalseto flip to system-block mode. Both modes parity-tested.{"name": "...", "parameters": ...}inside the assistant body. Tool responses render under roleipythonregardless of source role; mirrors the chat template'scontent | tojsonbranch — including the Jinja quirk that strings are iterable, so plain-string tool content gets JSON-quoted.parse_llama_3detects the JSON tool-call body shape with a strict starts-with-{+ parses-as-dict-with-namecheck; malformed JSON falls through tocontentrather than dropping silently.Tests
47 dedicated tests in
tests/test_llama_3.py:MODEL_RENDERER_MAPshape + factory routingpreserve_*_thinkingrejection,tools_in_user_messagetoggle)apply_chat_templateacross 11 conversation shapes (system + user, user-only, multi-turn, gen prompt, whitespace trimming, custom date, tools-in-user, tools-in-system, tool call round-trip, dict tool response, multiple-tool-calls rejection)parse_response(plain, tool call, malformed JSON fallthrough)Test plan
pytest tests/test_llama_3.py— 47 cases pass on both 1B and 3B mirrorspytest tests/ --ignore=tests/test_client.py) — 947 pass, 48 skipped, 1 xfailed (no regressions)meta-llama/Llama-3.2-1B-Instructparity directly (the unsloth mirror has been bit-verified, but a once-off canonical run is good defense in depth)🤖 Generated with Claude Code