Skip to content

Phase out hardening flags hurting startup; add three perf wins#2

Merged
Sam Gammon (sgammon) merged 6 commits intomainfrom
claude/improve-compiler-flags-2sKV8
May 6, 2026
Merged

Phase out hardening flags hurting startup; add three perf wins#2
Sam Gammon (sgammon) merged 6 commits intomainfrom
claude/improve-compiler-flags-2sKV8

Conversation

@sgammon
Copy link
Copy Markdown
Member

@sgammon Sam Gammon (sgammon) commented May 5, 2026

Summary

We're prioritising startup time over defence-in-depth for now. Six hardening flags move to a tracked, non-consumed labs.disabled.txt parking lot; three pure perf/size wins land in the live profiles. -fasynchronous-unwind-tables (profiling) and -fbasic-block-sections=all (Propeller-ready) are kept on per discussion.

Parked in labs.disabled.txt (each with its own re-enable criteria):

  • base.txt: -fstack-protector-strong, -fzero-call-used-regs=used-gpr
  • linux.txt + darwin.txt: -U_FORTIFY_SOURCE, -D_FORTIFY_SOURCE=2, -D_LIBCPP_HARDENING_MODE_FAST
  • linux-amd64.txt: -fcf-protection=full

labs.disabled.txt is not loaded by the rollup or by cli/cflags.{sh,ts} — the resolver matches on a closed set of {base, $os, $os-$arch, $os-bin, $os-$arch-bin}.txt and silently ignores everything else. It's a tracked, reviewable parking lot, not a profile. Re-enabling any flag is a literal cut-and-paste back into the named source file.

Added (small, free startup wins):

  • base.txt: -fmerge-all-constants — dedupe equal string/numeric constant pool entries → smaller binary, fewer relocs at load. Mild ISO violation we don't depend on.
  • linux.txt: -Wl,--hash-style=gnu — drop legacy sysv-hash; ~2× faster dyn-symbol lookup at load + smaller .dynsym.
  • linux.txt: -Wl,-O3 (replaces -Wl,-O2) — more aggressive linker-side string/section merging.

Kept on:

  • -fasynchronous-unwind-tables — better profiling > the table size.
  • -fbasic-block-sections=all (linux-amd64) — Propeller is on the roadmap; the per-BB sections are a hard prerequisite for the symbol-ordering file. Carrying it now avoids a rebuild-the-world later.

README features matrix updated with a new "Phasing in" subsection that lists parked flags and points at labs.disabled.txt.

Test plan

  • All eight (os × arch × {compile, --bin}) outputs from cli/cflags.sh byte-match cli/cflags.ts
  • Six dropped flags absent from resolved output; three new flags present
  • labs.disabled.txt correctly ignored by the resolver (verified via grep)
  • linux-amd64 compile-flag count: 69 → 65 (net 4 dropped)

Generated by Claude Code

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the flag profiles to separate “compile-time” flags from “final-link-only” flags, while temporarily parking several startup-costly hardening options in a tracked labs.disabled.txt file and adding a few startup/size optimizations to the live Linux/base profiles.

Changes:

  • Introduces *-bin “final-link-only” profiles (linux-bin.txt, darwin-bin.txt, plus per-arch placeholders) and updates the resolver docs accordingly.
  • Parks selected hardening flags in labs.disabled.txt and updates README to document the phased re-enable plan.
  • Adds/adjusts optimization and ELF/Mach-O related flags and adds two equivalent CLI resolvers (cli/cflags.sh and cli/cflags.ts).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
README.md Expands documentation: profile taxonomy, deployment floor, feature matrix, and phased hardening notes.
base.txt Updates layering description; adds -fmerge-all-constants; centralizes a clang-compatible warning set; keeps unwind settings.
linux.txt Adds ELF performance/hardening-related compile/link flags and clarifies final-link-only flag placement.
linux-bin.txt New: final-link-only Linux flags (gc-sections, z defs, debug compression, fatal warnings).
linux-amd64.txt New: amd64 ISA baseline/tuning plus retained BB section emission for Propeller readiness.
linux-amd64-bin.txt New: placeholder for amd64 final-link-only flags.
linux-arm64.txt New: arm64 ISA baseline and BTI/PAC-related link flags.
linux-arm64-bin.txt New: placeholder for arm64 final-link-only flags.
darwin.txt New: macOS compile/link flags (lld + chained-fixups-related options).
darwin-bin.txt New: final-link-only Mach-O dead-strip settings.
darwin-amd64.txt New: Intel macOS ISA baseline and rationale for excluding CET.
darwin-amd64-bin.txt New: placeholder for amd64 Darwin final-link-only flags.
darwin-arm64.txt New: Apple Silicon baseline and branch protection settings.
darwin-arm64-bin.txt New: placeholder for arm64 Darwin final-link-only flags.
labs.disabled.txt New: tracked “parking lot” for temporarily disabled hardening flags and re-enable notes.
cli/cflags.ts New: Bun/TypeScript resolver for closed-set profile rollup (+ --bin support).
cli/cflags.sh New: Bash resolver for closed-set profile rollup (+ --bin support).
Comments suppressed due to low confidence (1)

linux.txt:38

  • PR description lists three newly added flags, but this change also introduces -fno-semantic-interposition (not mentioned in the PR description's "Added" section). Please either update the PR description/test plan to include it, or clarify why this additional behavior change is intended.
# Skip the dynamic-symbol interposition contract for our own definitions:
# under ELF the linker assumes any global symbol in a shared library can
# be replaced via LD_PRELOAD, which forces every intra-library call through
# the GOT/PLT and disables a wide range of inlining + IPO. Disabling it
# lets clang resolve calls and constants directly inside the same DSO.
# Combined with -fno-plt below, the call-site overhead approaches a static
# build's. Linux/ELF only — Mach-O has different visibility semantics and
# clang ignores this flag there.
-fno-semantic-interposition

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread linux.txt Outdated
Comment thread darwin.txt Outdated
Comment thread linux.txt
Sam Gammon (sgammon) pushed a commit that referenced this pull request May 6, 2026
CI:
- New tests/probe.c — minimal C TU exercising libc calls, function-
  pointer dispatch, a small loop, and stack-allocated string formatting.
  Designed to stay warning-clean under the live -Wall -Wextra -Wpedantic
  -Wformat=2 -Wstrict-prototypes -Wshadow -Wundef set so any codegen
  regression breaks CI early.
- New .github/workflows/ci.yml — matrix probe on three GitHub-hosted
  runners (ubuntu-24.04 / ubuntu-24.04-arm / macos-14). For each
  target it installs clang-22 + lld-22 (apt.llvm.org on Linux,
  Homebrew on macOS), asserts cli/cflags.sh and cli/cflags.ts emit
  byte-identical output for the compile and binary profiles, then
  compiles tests/probe.c with the compile profile (-c) and
  compile+links+runs it with the binary profile.
- README CI section documents coverage and notes that darwin-amd64
  is exercised locally pre-tag because GitHub no longer offers an
  Intel-Mac runner.

Copilot review on PR #2 (3 valid comments addressed; 1 false positive
skipped):

- darwin.txt + linux.txt + README.md: the parked libc++ flag was
  written as the shorthand -D_LIBCPP_HARDENING_MODE_FAST in three
  comment blocks, but the real flag in labs.disabled.txt is
  -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST. Re-enabling
  via the shorthand is a silent no-op (libc++ checks the value of
  the macro, not its mere definition). All three call sites now use
  the exact spelling, and the README's parked-flag list is reformatted
  as a bullet list with -U_FORTIFY_SOURCE included for completeness.
- linux.txt:45 (false positive): Copilot flagged -fno-plt as a
  newly-introduced undocumented flag. -fno-plt was added in PR #1
  (commit 6d0e14c) and is not in this PR's diff; Copilot compared
  the PR description's "Added" section against the file content
  rather than against the actual diff. Skipped.
We're prioritising startup time over defence-in-depth for now. Six
hardening flags move to a new tracked, non-consumed parking-lot file
labs.disabled.txt; three pure perf/size wins land in the live profiles.

Parked in labs.disabled.txt (each with its own re-enable criteria):

  base.txt:
    -fstack-protector-strong               (canary prologue/epilogue)
    -fzero-call-used-regs=used-gpr         (per-return reg-clearing)

  linux.txt:
    -U_FORTIFY_SOURCE
    -D_FORTIFY_SOURCE=2                    (per-call libc bounds checks)
    -D_LIBCPP_HARDENING_MODE_FAST          (per-call libc++ bounds checks)

  darwin.txt:
    -U_FORTIFY_SOURCE
    -D_FORTIFY_SOURCE=2
    -D_LIBCPP_HARDENING_MODE_FAST

  linux-amd64.txt:
    -fcf-protection=full                   (ENDBR64 on every indirect target)

labs.disabled.txt is not loaded by the rollup or by cli/cflags.{sh,ts}
— the resolver only matches a closed set of {base, $os, $os-$arch,
$os-bin, $os-$arch-bin}.txt. Verified with a smoke test.

Kept on:
  -fasynchronous-unwind-tables             (better profiling > table size)
  -fbasic-block-sections=all (linux-amd64) (Propeller is on the roadmap;
                                            keeping the per-BB sections
                                            avoids a rebuild-the-world
                                            once we wire in the symbol-
                                            ordering file)

Added (small, free startup wins):
  base.txt:
    -fmerge-all-constants                  (dedupe equal string/numeric
                                            constant pool entries → smaller
                                            binary, fewer relocs at load;
                                            mild ISO violation we don't
                                            depend on)

  linux.txt:
    -Wl,--hash-style=gnu                   (drop legacy sysv-hash; ~2x
                                            faster dyn-symbol lookup at
                                            load + smaller .dynsym)
    -Wl,-O3 (replaces -Wl,-O2)             (more aggressive linker-side
                                            string/section merging; pure
                                            link-time cost)

README features matrix updated; new "Phasing in" subsection lists the
parked flags and points at labs.disabled.txt. cli/cflags.{sh,ts}
output verified byte-identical across all eight (os × arch × mode)
combinations after the change.
CI:
- New tests/probe.c — minimal C TU exercising libc calls, function-
  pointer dispatch, a small loop, and stack-allocated string formatting.
  Designed to stay warning-clean under the live -Wall -Wextra -Wpedantic
  -Wformat=2 -Wstrict-prototypes -Wshadow -Wundef set so any codegen
  regression breaks CI early.
- New .github/workflows/ci.yml — matrix probe on three GitHub-hosted
  runners (ubuntu-24.04 / ubuntu-24.04-arm / macos-14). For each
  target it installs clang-22 + lld-22 (apt.llvm.org on Linux,
  Homebrew on macOS), asserts cli/cflags.sh and cli/cflags.ts emit
  byte-identical output for the compile and binary profiles, then
  compiles tests/probe.c with the compile profile (-c) and
  compile+links+runs it with the binary profile.
- README CI section documents coverage and notes that darwin-amd64
  is exercised locally pre-tag because GitHub no longer offers an
  Intel-Mac runner.

Copilot review on PR #2 (3 valid comments addressed; 1 false positive
skipped):

- darwin.txt + linux.txt + README.md: the parked libc++ flag was
  written as the shorthand -D_LIBCPP_HARDENING_MODE_FAST in three
  comment blocks, but the real flag in labs.disabled.txt is
  -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST. Re-enabling
  via the shorthand is a silent no-op (libc++ checks the value of
  the macro, not its mere definition). All three call sites now use
  the exact spelling, and the README's parked-flag list is reformatted
  as a bullet list with -U_FORTIFY_SOURCE included for completeness.
- linux.txt:45 (false positive): Copilot flagged -fno-plt as a
  newly-introduced undocumented flag. -fno-plt was added in PR #1
  (commit 6d0e14c) and is not in this PR's diff; Copilot compared
  the PR description's "Added" section against the file content
  rather than against the actual diff. Skipped.
The first run of ci.yml passed on linux-amd64 but failed in ~30s on
linux-arm64 and darwin-arm64, both during the toolchain-install step.
I don't have direct access to the GHA logs from this session, so this
commit makes the install path more robust along the most likely
failure modes:

- Linux: replace the hand-rolled gpg + apt source setup with
  apt.llvm.org's llvm.sh, which is the canonical installer and
  handles distro/codename detection across noble/noble-arm64.
  Drop --update-alternatives in favour of explicit symlinks in
  /usr/local/bin (which precedes /usr/bin in PATH on ubuntu-24.04),
  so `clang` and `ld.lld` resolve to the v22 binaries even if the
  base image already has an unversioned `clang` from a different
  source.

- Darwin: keep the Homebrew install but capture brew --prefix llvm
  into a CC env var passed via $GITHUB_ENV so subsequent steps
  invoke that clang explicitly instead of relying on PATH order.
  Also prepend $(brew --prefix llvm)/bin so -fuse-ld=lld finds
  ld64.lld.

- New "Toolchain sanity" step prints `which clang`, `clang --version`,
  and `ld.lld` / `ld64.lld --version` so any future failure surfaces
  the actual install state in the log.

Compile / link steps now invoke "$CC" explicitly rather than relying
on whichever `clang` is first in PATH.
@sgammon Sam Gammon (sgammon) force-pushed the claude/improve-compiler-flags-2sKV8 branch from eb1daec to aba1744 Compare May 6, 2026 00:11
…tions

Two unrelated CI failures were chasing different root causes:

linux-arm64
  ld.lld errored hard at link time:
    ld.lld: error: /lib/aarch64-linux-gnu/Scrt1.o: -z force-bti:
      file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property
  …and the same for crti.o / crtn.o under -z pac-plt. lld treats
  these two -z flags as strict — every input object must carry the
  matching GNU property note or the link aborts. Stock Ubuntu noble
  glibc CRT objects ship without those notes, so the link can never
  succeed on a vanilla distro toolchain.

  Drop -Wl,-z,force-bti and -Wl,-z,pac-plt from linux-arm64.txt.
  Keep -mbranch-protection=standard so our own .o files still emit
  the PAC/BTI sequences and per-object property notes — when the
  rest of the link surface (CRT, libc) catches up, we can re-enable
  the linker-side enforcement. Comment in linux-arm64.txt records
  the rationale and the verbatim error message so the next person
  who sees it knows what changed and what would need to be true to
  flip it back on.

darwin-arm64
  clang: error: invalid linker name in argument '-fuse-ld=lld'
  This message is Apple-clang's signature for "I don't know what
  'lld' is" — meaning Apple's /usr/bin/clang got picked up instead
  of the brew-installed mainline clang. Two defensive changes to
  ci.yml so the next failure surfaces the actual install state:

  - Install step now hard-asserts that $(brew --prefix llvm)/bin/clang
    AND $(brew --prefix llvm)/bin/ld64.lld both exist. If brew shipped
    llvm without ld64.lld (rare but possible on some images), the step
    fails immediately with the missing path printed instead of letting
    the failure cascade into a confusing clang error two steps later.
  - Toolchain sanity step now invokes "$CC" instead of bare `clang`,
    explicitly rejects an Apple-clang $CC, and uses
    `--print-prog-name=ld.lld` / `--print-prog-name=ld64.lld` to show
    the lld path the driver would actually pick under -fuse-ld=lld.
Homebrew's `llvm` formula on macOS does not ship lld — confirmed by
listing /opt/homebrew/opt/llvm/bin on macos-14 (clang, MLIR tools,
FileCheck, ~150 binaries, zero lld variants). On Mac, `lld` is a
standalone Homebrew formula; the asymmetry vs Linux (where
apt.llvm.org bundles them in the llvm-toolchain-N package set) is
why my "install lld with the llvm formula" assumption broke darwin
CI even though linux CI worked.

Fix: `brew install llvm lld zstd` (add lld to the install list),
then symlink $(brew --prefix lld)/bin/ld64.lld into
$(brew --prefix llvm)/bin/ld64.lld so clang's `-fuse-ld=lld` driver
finds it in its own bin dir (clang searches there before falling
through to PATH). Hard-assert both binaries exist before continuing
so the failure mode is "MISSING: <path>" not the cryptic
"invalid linker name in argument '-fuse-ld=lld'" that surfaces two
steps later when Apple's clang inadvertently gets picked up.
The toolchain-sanity guard that's supposed to reject Apple's clang
fired a false positive on macos-14: \$CC was correctly set to
/opt/homebrew/opt/llvm/bin/clang (Homebrew clang 22.1.4), but the
naive `grep -qiE 'apple|xcode'` matched on the target triple line:

    Target: arm64-apple-darwin23.6.0

…which every clang on macOS prints, Homebrew or otherwise. Restrict
the match to the *version banner* (first line) and require the exact
"Apple clang" string Apple's binary uses there. Homebrew's banner is
"Homebrew clang version 22.1.4"; Xcode/Apple's is "Apple clang
version 15.x.y …". The discriminator is the leading word, not the
case-insensitive presence of "apple" anywhere in the multi-line
output.

Net: lld install succeeded, brew clang is the one running, and CI
should now get past the sanity step.
@sgammon Sam Gammon (sgammon) merged commit fd5c31e into main May 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants