Fix #94: vendor llama.cpp symlinks so SPM consumers can build#96
Conversation
The Sources/LocalLLMClientLlamaC/ directory contained 25 symlinks
pointing into the exclude/llama.cpp/ git submodule. Because SwiftPM
does not recurse submodules when resolving package dependencies,
every downstream consumer using `.package(url:..., from:)` (or
`branch:` / `revision:`) ended up with dangling symlinks and a build
failure. There is no SPM hook for downstream Package.swift to
request submodule recursion.
This change replaces every symlink with a regular file copy of its
target content from llama.cpp at submodule-pinned commit 9a3ea685,
and drops the submodule entirely:
- 22 file symlinks → vendored copies (clip*, mtmd*, common/* …)
- 2 directory symlinks → vendored directories
(vendor/minja → minja/, vendor/nlohmann → nlohmann/)
- 1 nested symlink → vendored copy
(miniaudio/miniaudio.h)
- common/json-partial.h was a forwarding header reaching into
../exclude/. Inlined the upstream content plus the existing
`using json = nlohmann::ordered_json;` line.
- stb/stb_image.h had a Linux branch including ../exclude/.
Vendored upstream stb_image.h as stb/stb_image_full.h and
pointed the Linux branch at it. Mac/iOS path (stb_image.swift)
unchanged.
- .gitmodules removed (the only entry was for exclude/llama.cpp).
- exclude/ directory removed.
The seven include/ forwarding headers (ggml.h, ggml-alloc.h,
ggml-backend.h, ggml-cpu.h, ggml-opt.h, gguf.h, llama.h) are left
alone: they use `#if __has_include(<llama/...>)` and the
LocalLLMClientLlamaFramework xcframework satisfies the primary path.
The dead `#else` branches still mention exclude/ but never compile
with the xcframework on the include path.
CI: adds a new build-spm-consumer-no-submodules job that does
actions/checkout@v4 without submodules:recursive and runs
`swift build --target LocalLLMClientLlama` (and MLX). This is the
standalone regression test that today's main would fail.
Local verification: `swift build --target LocalLLMClientLlama` from
a working tree with the submodule removed builds clean (170/170
compiled).
Trade-offs (maintainer to weigh):
- Repo grows by ~1.7 MB of vendored source (vendor/nlohmann/
json.hpp alone is 953 KB).
- scripts/update_dependencies.sh and the update-dependencies
workflow currently `git submodule update --remote`. After this
PR that step is a no-op; bumping llama.cpp now means re-running
a (to-be-added) vendoring script. Out of scope for this PR —
flagged for follow-up.
- This is one of four options outlined in tattn#94 (vendor / build-tool
plugin / binary target / document workaround). The PR
demonstrates the vendor option; if you prefer another shape,
this branch can be closed and the fix taken from a different
angle. The CI job in this PR is approach-agnostic and worth
keeping in any case.
Closes tattn#94.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request integrates core llama.cpp components directly into the local source tree, replacing the submodule dependency with local implementations for chat parsing, multimodal support, and audio preprocessing. The review identifies several technical improvements: replacing std::rand() with modern for better thread safety and distribution, substituting std::localtime with thread-safe variants (localtime_r or localtime_s) to avoid race conditions in multi-threaded environments, and moving away from deprecated std::wstring_convert and std::codecvt_utf8 utilities to ensure long-term maintainability.
I am having trouble creating individual review comments. Click here to see my feedback.
Sources/LocalLLMClientLlamaC/common/chat-parser.cpp (22)
The use of std::rand() without a prior call to std::srand() results in a deterministic sequence of numbers, which may lead to predictable healing markers across different runs of the application. Additionally, std::rand() is not thread-safe. Consider using the modern C++ <random> header, specifically std::random_device and std::mt19937, for better distribution and thread safety.
Sources/LocalLLMClientLlamaC/common/common.cpp (428)
std::localtime is not thread-safe as it returns a pointer to a static internal buffer that is shared across all threads. In a multi-threaded environment, this can lead to race conditions where the returned data is overwritten by another thread. Use localtime_r on POSIX systems or localtime_s on Windows to ensure thread safety.
Sources/LocalLLMClientLlamaC/common/common.cpp (716)
std::wstring_convert and std::codecvt_utf8 are deprecated since C++17. While they are still functional in many compilers, they are slated for removal in future standards. Consider using a more modern library or platform-specific APIs for UTF-8/UTF-16 conversions to ensure long-term maintainability.
Sources/LocalLLMClientLlamaC/minja/chat-template.hpp (509)
Similar to the issue in common.cpp, std::localtime is not thread-safe. Since this is a header-only component that might be used in various contexts, using the thread-safe variants (localtime_r or localtime_s) is highly recommended to avoid potential race conditions.
|
In this library, I use the XCFramework provided by llama.cpp whenever possible. The reason is that once I start copying files, I would eventually end up maintaining a separate version of llama.cpp in this repository. Could you avoid copying files as much as possible? |
|
Thanks @tattn — that's a fair call, and a well-stated rationale. Avoiding the "separate fork of llama.cpp" maintenance burden is exactly the right priority for this library. Closing this PR. Capturing the architectural decision here so anyone hitting #94 in the future has context:
For OCCTSwiftPartsAgent we're consuming Thanks again for the quick review and the clear feedback. |
Closes #94.
Problem
Sources/LocalLLMClientLlamaC/contains 25 symlinks pointing into theexclude/llama.cpp/git submodule. Because SwiftPM does not recurse submodules during package resolution (and there is no Package.swift hook for downstream to request it), every consumer using.package(url:..., from:)/branch:/revision:ends up with dangling symlinks and a build failure:(
<memory>not found, originally reported, is the same root cause hit at a slightly later point in the include chain.)Fix
This PR implements option 1 from the issue — vendoring the symlinked content directly into the package — picked because it gives the cleanest end-state for SPM consumers (zero runtime/build-time dependence on submodule state).
Concretely, against submodule pin
9a3ea685:vendor/minja/→minja/(chat-template.hpp, minja.hpp ≈ 150 KB)vendor/nlohmann/→nlohmann/(json.hpp 953 KB, json_fwd.hpp 6.6 KB)miniaudio/miniaudio.hcommon/json-partial.hwas a forwarding header reaching into../exclude/. Inlined the upstream content + the existingusing json = nlohmann::ordered_json;line.stb/stb_image.hhad a Linux branch including../exclude/. Vendored upstreamstb_image.hasstb/stb_image_full.hand pointed the Linux branch at it. Mac/iOS path (stb_image.swift) unchanged..gitmodulesremoved (only entry was forexclude/llama.cpp).exclude/directory removed.The seven
include/forwarding headers (ggml.h,ggml-alloc.h,ggml-backend.h,ggml-cpu.h,ggml-opt.h,gguf.h,llama.h) are left alone: each uses#if __has_include(<llama/...>)and theLocalLLMClientLlamaFrameworkxcframework satisfies the primary path. The#elsebranches still mentionexclude/but never compile when the xcframework is on the include path.Total vendored: ~1.7 MB.
Standalone regression test
Added a new CI job
build-spm-consumer-no-submodulesthat doesactions/checkout@v4withoutsubmodules: recursiveand runsswift build --target LocalLLMClientLlama(andLocalLLMClientMLX). Today'smainwould fail this check; with this PR it should pass. The test is approach-agnostic — even if you'd rather adopt a different fix (binary target, build-tool plugin, etc.), the CI job is worth keeping so this regression can't recur silently.Local verification
And via a downstream consumer package (closer to the real reproduction in #94):
Trade-offs (please weigh before merging)
Repo size: grows ~1.7 MB. Bulk of that is
nlohmann/json.hpp(953 KB) — small relative to the existing binaryxcframeworkartefact pulled at resolve time, but it is a real diff.scripts/update_dependencies.sh+update-dependencies.ymlworkflow: today theygit submodule update --remoteto bump llama.cpp. After this PR that step is a no-op — bumping llama.cpp now requires re-running a (to-be-added) vendoring script that overwrites the vendored files from a target llama.cpp SHA. Out of scope for this PR; flagging here as a follow-up. Sketch:Happy to follow up with this script in a separate PR if you'd like to merge this one first.
Alternative architectures considered (full enumeration in LocalLLMClientLlama: 'memory' file not found — C wrapper needs C++ stdlib #94 comment):
If you'd prefer a different shape, this branch can be closed and the work redone — the CI job here is still worth keeping in either case.
PR Support Gemma 4 #92 (Gemma 4) is open and uses the submodule pattern. Rebasing that on top of this PR would require running the vendoring step against the new llama.cpp SHA Gemma 4 needs.
Cc / context: filed by a downstream of
OCCTSwiftPartsAgent, where this is blocking the in-process LLM path on iPad. Pairs with #95 (the MLX backendTokenizersfix); together both backends now build cleanly under SPM consumption.