Skip to content

Fix #94: vendor llama.cpp symlinks so SPM consumers can build#96

Closed
gsdali wants to merge 1 commit into
tattn:mainfrom
gsdali:fix/94-vendor-llama-symlinks
Closed

Fix #94: vendor llama.cpp symlinks so SPM consumers can build#96
gsdali wants to merge 1 commit into
tattn:mainfrom
gsdali:fix/94-vendor-llama-symlinks

Conversation

@gsdali
Copy link
Copy Markdown
Contributor

@gsdali gsdali commented Apr 25, 2026

Closes #94.

Problem

Sources/LocalLLMClientLlamaC/ contains 25 symlinks pointing into the exclude/llama.cpp/ git submodule. Because SwiftPM does not recurse submodules during package resolution (and there is no Package.swift hook for downstream to request it), every consumer using .package(url:..., from:) / branch: / revision: ends up with dangling symlinks and a build failure:

warning: ignoring broken symlink .../Sources/LocalLLMClientLlamaC/clip.cpp
…
.build/checkouts/LocalLLMClient/Sources/LocalLLMClientLlamaC/include/utils.h:3:10:
fatal error: '../common/chat.h' file not found

(<memory> not found, originally reported, is the same root cause hit at a slightly later point in the include chain.)

Fix

This PR implements option 1 from the issue — vendoring the symlinked content directly into the package — picked because it gives the cleanest end-state for SPM consumers (zero runtime/build-time dependence on submodule state).

Concretely, against submodule pin 9a3ea685:

  • 22 file symlinks → real file copies (clip*, mtmd*, common/*, …)
  • 2 directory symlinks → vendored directories
    • vendor/minja/minja/ (chat-template.hpp, minja.hpp ≈ 150 KB)
    • vendor/nlohmann/nlohmann/ (json.hpp 953 KB, json_fwd.hpp 6.6 KB)
  • 1 nested symlinkminiaudio/miniaudio.h
  • common/json-partial.h was a forwarding header reaching into ../exclude/. Inlined the upstream content + the existing using json = nlohmann::ordered_json; line.
  • stb/stb_image.h had a Linux branch including ../exclude/. Vendored upstream stb_image.h as stb/stb_image_full.h and pointed the Linux branch at it. Mac/iOS path (stb_image.swift) unchanged.
  • .gitmodules removed (only entry was for exclude/llama.cpp).
  • exclude/ directory removed.

The seven include/ forwarding headers (ggml.h, ggml-alloc.h, ggml-backend.h, ggml-cpu.h, ggml-opt.h, gguf.h, llama.h) are left alone: each uses #if __has_include(<llama/...>) and the LocalLLMClientLlamaFramework xcframework satisfies the primary path. The #else branches still mention exclude/ but never compile when the xcframework is on the include path.

Total vendored: ~1.7 MB.

Standalone regression test

Added a new CI job build-spm-consumer-no-submodules that does actions/checkout@v4 without submodules: recursive and runs swift build --target LocalLLMClientLlama (and LocalLLMClientMLX). Today's main would fail this check; with this PR it should pass. The test is approach-agnostic — even if you'd rather adopt a different fix (binary target, build-tool plugin, etc.), the CI job is worth keeping so this regression can't recur silently.

Local verification

$ git clone --no-recurse-submodules https://github.com/gsdali/LocalLLMClient.git
$ cd LocalLLMClient && git checkout fix/94-vendor-llama-symlinks
$ swift build --target LocalLLMClientLlama
…
[170/170] Compiling LocalLLMClientLlama resource_bundle_accessor.swift
Build of target: 'LocalLLMClientLlama' complete! (44.89s)

And via a downstream consumer package (closer to the real reproduction in #94):

// Package.swift
.package(
    url: "https://github.com/gsdali/LocalLLMClient.git",
    revision: "67388722c8621dfcb7c7ec3dd01cf1bfac5e5784"
),
// target deps:
.product(name: "LocalLLMClient",      package: "LocalLLMClient"),
.product(name: "LocalLLMClientLlama", package: "LocalLLMClient"),
$ swift package resolve     # no submodules fetched
$ swift build
…
Build complete! (26.96s)

Trade-offs (please weigh before merging)

  1. Repo size: grows ~1.7 MB. Bulk of that is nlohmann/json.hpp (953 KB) — small relative to the existing binary xcframework artefact pulled at resolve time, but it is a real diff.

  2. scripts/update_dependencies.sh + update-dependencies.yml workflow: today they git submodule update --remote to bump llama.cpp. After this PR that step is a no-op — bumping llama.cpp now requires re-running a (to-be-added) vendoring script that overwrites the vendored files from a target llama.cpp SHA. Out of scope for this PR; flagging here as a follow-up. Sketch:

    # bump_llama.sh <new-sha>
    git clone https://github.com/ggml-org/llama.cpp /tmp/llama && cd /tmp/llama && git checkout $1
    # for each vendored path, copy from /tmp/llama back into Sources/LocalLLMClientLlamaC/

    Happy to follow up with this script in a separate PR if you'd like to merge this one first.

  3. Alternative architectures considered (full enumeration in LocalLLMClientLlama: 'memory' file not found — C wrapper needs C++ stdlib #94 comment):

    • Vendor (this PR)
    • SPM build-tool plugin that materialises symlinks at build time → blocked: plugins can't fetch git content
    • Binary target wrapping LocalLLMClientLlamaC pre-compiled → cleanest consumer experience but heavy release infrastructure
    • Document workaround → poor DX, doesn't help SPM-via-Xcode users

    If you'd prefer a different shape, this branch can be closed and the work redone — the CI job here is still worth keeping in either case.

  4. PR Support Gemma 4 #92 (Gemma 4) is open and uses the submodule pattern. Rebasing that on top of this PR would require running the vendoring step against the new llama.cpp SHA Gemma 4 needs.

Cc / context: filed by a downstream of OCCTSwiftPartsAgent, where this is blocking the in-process LLM path on iPad. Pairs with #95 (the MLX backend Tokenizers fix); together both backends now build cleanly under SPM consumption.

The Sources/LocalLLMClientLlamaC/ directory contained 25 symlinks
pointing into the exclude/llama.cpp/ git submodule. Because SwiftPM
does not recurse submodules when resolving package dependencies,
every downstream consumer using `.package(url:..., from:)` (or
`branch:` / `revision:`) ended up with dangling symlinks and a build
failure. There is no SPM hook for downstream Package.swift to
request submodule recursion.

This change replaces every symlink with a regular file copy of its
target content from llama.cpp at submodule-pinned commit 9a3ea685,
and drops the submodule entirely:

  - 22 file symlinks → vendored copies (clip*, mtmd*, common/* …)
  - 2 directory symlinks → vendored directories
      (vendor/minja → minja/, vendor/nlohmann → nlohmann/)
  - 1 nested symlink → vendored copy
      (miniaudio/miniaudio.h)
  - common/json-partial.h was a forwarding header reaching into
      ../exclude/. Inlined the upstream content plus the existing
      `using json = nlohmann::ordered_json;` line.
  - stb/stb_image.h had a Linux branch including ../exclude/.
      Vendored upstream stb_image.h as stb/stb_image_full.h and
      pointed the Linux branch at it. Mac/iOS path (stb_image.swift)
      unchanged.
  - .gitmodules removed (the only entry was for exclude/llama.cpp).
  - exclude/ directory removed.

The seven include/ forwarding headers (ggml.h, ggml-alloc.h,
ggml-backend.h, ggml-cpu.h, ggml-opt.h, gguf.h, llama.h) are left
alone: they use `#if __has_include(<llama/...>)` and the
LocalLLMClientLlamaFramework xcframework satisfies the primary path.
The dead `#else` branches still mention exclude/ but never compile
with the xcframework on the include path.

CI: adds a new build-spm-consumer-no-submodules job that does
actions/checkout@v4 without submodules:recursive and runs
`swift build --target LocalLLMClientLlama` (and MLX). This is the
standalone regression test that today's main would fail.

Local verification: `swift build --target LocalLLMClientLlama` from
a working tree with the submodule removed builds clean (170/170
compiled).

Trade-offs (maintainer to weigh):

  - Repo grows by ~1.7 MB of vendored source (vendor/nlohmann/
    json.hpp alone is 953 KB).
  - scripts/update_dependencies.sh and the update-dependencies
    workflow currently `git submodule update --remote`. After this
    PR that step is a no-op; bumping llama.cpp now means re-running
    a (to-be-added) vendoring script. Out of scope for this PR —
    flagged for follow-up.
  - This is one of four options outlined in tattn#94 (vendor / build-tool
    plugin / binary target / document workaround). The PR
    demonstrates the vendor option; if you prefer another shape,
    this branch can be closed and the fix taken from a different
    angle. The CI job in this PR is approach-agnostic and worth
    keeping in any case.

Closes tattn#94.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates core llama.cpp components directly into the local source tree, replacing the submodule dependency with local implementations for chat parsing, multimodal support, and audio preprocessing. The review identifies several technical improvements: replacing std::rand() with modern for better thread safety and distribution, substituting std::localtime with thread-safe variants (localtime_r or localtime_s) to avoid race conditions in multi-threaded environments, and moving away from deprecated std::wstring_convert and std::codecvt_utf8 utilities to ensure long-term maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

Sources/LocalLLMClientLlamaC/common/chat-parser.cpp (22)

medium

The use of std::rand() without a prior call to std::srand() results in a deterministic sequence of numbers, which may lead to predictable healing markers across different runs of the application. Additionally, std::rand() is not thread-safe. Consider using the modern C++ <random> header, specifically std::random_device and std::mt19937, for better distribution and thread safety.

Sources/LocalLLMClientLlamaC/common/common.cpp (428)

medium

std::localtime is not thread-safe as it returns a pointer to a static internal buffer that is shared across all threads. In a multi-threaded environment, this can lead to race conditions where the returned data is overwritten by another thread. Use localtime_r on POSIX systems or localtime_s on Windows to ensure thread safety.

Sources/LocalLLMClientLlamaC/common/common.cpp (716)

medium

std::wstring_convert and std::codecvt_utf8 are deprecated since C++17. While they are still functional in many compilers, they are slated for removal in future standards. Consider using a more modern library or platform-specific APIs for UTF-8/UTF-16 conversions to ensure long-term maintainability.

Sources/LocalLLMClientLlamaC/minja/chat-template.hpp (509)

medium

Similar to the issue in common.cpp, std::localtime is not thread-safe. Since this is a header-only component that might be used in various contexts, using the thread-safe variants (localtime_r or localtime_s) is highly recommended to avoid potential race conditions.

@tattn
Copy link
Copy Markdown
Owner

tattn commented Apr 25, 2026

In this library, I use the XCFramework provided by llama.cpp whenever possible.
When the interface is insufficient, I add only the minimum necessary symbolic links or header implementations.

The reason is that once I start copying files, I would eventually end up maintaining a separate version of llama.cpp in this repository.

Could you avoid copying files as much as possible?

@gsdali
Copy link
Copy Markdown
Contributor Author

gsdali commented Apr 25, 2026

Thanks @tattn — that's a fair call, and a well-stated rationale. Avoiding the "separate fork of llama.cpp" maintenance burden is exactly the right priority for this library.

Closing this PR. Capturing the architectural decision here so anyone hitting #94 in the future has context:

  • Vendoring is off the table by design — keeps llama.cpp as a tracked submodule, not a forked copy.
  • Symlink + xcframework is the chosen architecture — the xcframework provides the bulk of the surface; symlinks fill the gaps.
  • SPM-consumer breakage (this PR's reason for existing) remains a real downstream problem when consumers pull via .package(url:..., from:) without --recurse-submodules. Two remaining viable shapes for a future fix:
    1. Binary target wrapping LocalLLMClientLlamaC — pre-build the C wrapper into its own xcframework alongside the existing one, ship as .binaryTarget. Cleanest consumer experience but heavier release infrastructure.
    2. A SwiftPM build-tool plugin that materialises the symlinks at build time — currently blocked by SPM not exposing a hook to fetch git submodules during package resolution; could become viable if SPM grows that capability.
  • Downstream workaround for now: consumers who need the Llama backend over SPM can pin a fork revision that vendors the symlinks (this PR's branch is one such snapshot, though it won't track future llama.cpp bumps), or use the local-development path of cloning the repo with --recurse-submodules and depending on a path package.

For OCCTSwiftPartsAgent we're consuming LocalLLMClient + LocalLLMClientMLX only, so the symlink issue doesn't affect us — happy that #95 unblocks the MLX path. Will revisit Llama integration in a future phase if/when the binary-target shape (or another approach) lands.

Thanks again for the quick review and the clear feedback.

@gsdali gsdali closed this Apr 25, 2026
@gsdali gsdali deleted the fix/94-vendor-llama-symlinks branch April 25, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LocalLLMClientLlama: 'memory' file not found — C wrapper needs C++ stdlib

2 participants