Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
name: ci

on:
push:
branches: [main]
pull_request:
workflow_dispatch:

permissions:
contents: read

jobs:
probe:
name: probe (${{ matrix.target.os }}-${{ matrix.target.arch }})
runs-on: ${{ matrix.target.runner }}
strategy:
fail-fast: false
matrix:
target:
- { os: linux, arch: amd64, runner: ubuntu-24.04 }
- { os: linux, arch: arm64, runner: ubuntu-24.04-arm }
- { os: darwin, arch: arm64, runner: macos-14 }

steps:
- uses: actions/checkout@v4

# ── toolchain ───────────────────────────────────────────────────
# The flag set assumes mainline clang-22 / lld-22 (zstd debug
# compression in lld, recent CET / PAC / BTI codegen, modern
# libc++ hardening macros, …). We install via apt.llvm.org's
# llvm.sh on Linux and Homebrew on macOS, then pin a stable name
# for the rest of the job via $CC and a /usr/local/bin symlink
# for ld.lld / ld64.lld.
- name: Install clang 22 (Linux)
if: matrix.target.os == 'linux'
run: |
set -eux
wget -qO /tmp/llvm.sh https://apt.llvm.org/llvm.sh
chmod +x /tmp/llvm.sh
# `all` pulls clang + lld + libc++ + lldb + tools.
sudo /tmp/llvm.sh 22 all
sudo apt-get install -y --no-install-recommends libzstd-dev
# clang -fuse-ld=lld searches PATH for `ld.lld` (unversioned).
# Without the symlink it would fall back to the default ld and
# silently bypass our linker pinning.
sudo ln -sf /usr/bin/clang-22 /usr/local/bin/clang
sudo ln -sf /usr/bin/clang-22 /usr/local/bin/clang++
sudo ln -sf /usr/bin/ld.lld-22 /usr/local/bin/ld.lld
echo "CC=/usr/local/bin/clang" >> "$GITHUB_ENV"
echo "CXX=/usr/local/bin/clang++" >> "$GITHUB_ENV"

- name: Install clang via Homebrew (macOS)
if: matrix.target.os == 'darwin'
run: |
set -eux
brew update
# On macOS, Homebrew's `llvm` formula intentionally does NOT
# ship lld (Apple's ld64 is the platform default and the
# llvm bottle stops at clang + libs). lld lives in its own
# `lld` formula. Install both, then symlink ld64.lld next
# to clang so `-fuse-ld=lld` resolves without us having to
# depend on PATH ordering inside clang's lookup.
brew install llvm lld zstd
llvm_prefix="$(brew --prefix llvm)"
lld_prefix="$(brew --prefix lld)"
test -x "$llvm_prefix/bin/clang" || { echo "MISSING: $llvm_prefix/bin/clang"; ls -la "$llvm_prefix/bin" || true; exit 1; }
test -x "$lld_prefix/bin/ld64.lld" || { echo "MISSING: $lld_prefix/bin/ld64.lld"; ls -la "$lld_prefix/bin" || true; exit 1; }
ln -sf "$lld_prefix/bin/ld64.lld" "$llvm_prefix/bin/ld64.lld"
echo "$llvm_prefix/bin" >> "$GITHUB_PATH"
echo "CC=$llvm_prefix/bin/clang" >> "$GITHUB_ENV"
echo "CXX=$llvm_prefix/bin/clang++" >> "$GITHUB_ENV"

- name: Toolchain sanity
run: |
set -eux
echo "PATH=$PATH"
echo "CC=$CC"
echo "which clang: $(command -v clang || true)"
"$CC" --version
# Reject Apple's clang outright — it doesn't accept
# -fuse-ld=lld and the live profile depends on lld.
# Match only the *version banner* (first line); the target
# triple printed below it always contains "apple-darwin"
# even on Homebrew clang and would false-positive a naive
# case-insensitive `apple` match.
if "$CC" --version | head -1 | grep -qi '^Apple clang'; then
echo "BAD: \$CC resolves to Apple clang"
exit 1
fi
# Probe the linker driver for the lld variant clang would
# pick under -fuse-ld=lld. On linux it's ld.lld, on darwin
# ld64.lld. `--print-prog-name` returns the bare name when
# not found, so a real path means lld is reachable.
"$CC" -fuse-ld=lld --print-prog-name=ld.lld 2>/dev/null || true
"$CC" -fuse-ld=lld --print-prog-name=ld64.lld 2>/dev/null || true
(command -v ld.lld && ld.lld --version) || true
(command -v ld64.lld && ld64.lld --version) || true

- uses: oven-sh/setup-bun@v2

# ── tests ───────────────────────────────────────────────────────
- name: Verify cflags.sh / cflags.ts emit identical output
run: |
set -eux
os=${{ matrix.target.os }}
arch=${{ matrix.target.arch }}
for mode in "" "--bin"; do
sh=$(bash ./cli/cflags.sh "$os" "$arch" $mode)
ts=$(bun run ./cli/cflags.ts "$os" "$arch" $mode)
if [ "$sh" != "$ts" ]; then
echo "parity mismatch (mode='${mode:-compile}')"
diff <(printf '%s\n' "$sh") <(printf '%s\n' "$ts")
exit 1
fi
done

- name: Compile probe with the compile profile
run: |
set -eux
CFLAGS=$(bash ./cli/cflags.sh ${{ matrix.target.os }} ${{ matrix.target.arch }})
echo "CC=$CC"
echo "CFLAGS:" $CFLAGS
"$CC" $CFLAGS -c tests/probe.c -o /tmp/probe.o

- name: Compile + link + run probe with the binary profile
run: |
set -eux
BFLAGS=$(bash ./cli/cflags.sh ${{ matrix.target.os }} ${{ matrix.target.arch }} --bin)
echo "CC=$CC"
echo "BFLAGS:" $BFLAGS
"$CC" $BFLAGS tests/probe.c -o /tmp/probe
/tmp/probe
42 changes: 36 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,23 +34,40 @@ Compiler features activated or facilitated by these flags:
| ----------------------------- | ------------- | ----------------------------------------------------------------- |
| [Thin LTO][9] | All | Emits and optimizes as LLVM bitcode, program-wide |
| Function/Data Sections | All | `-f{data,function}-sections` (collected at *final* link, see below) |
| Stack Hardening | All | `-fstack-protector-strong`, `-fstack-clash-protection` (Linux) |
| Register Zeroing | All | `-fzero-call-used-regs=used-gpr` shrinks ROP/JOP gadget surface |
| Constant merging | All | `-fmerge-all-constants` for smaller binary + fewer relocs at load |
| Stack Hardening | Linux | `-fstack-clash-protection` (canaries are parked, see "Phasing in") |
| Async Unwind Tables | All | `-fasynchronous-unwind-tables` for profilers + crash collectors |
| FORTIFY_SOURCE=2 | Linux + Darwin| musl-compatible level 2; glibc consumers may override to `=3` |
| libc++ hardening (FAST) | Linux + Darwin| `_LIBCPP_HARDENING_MODE_FAST` for vendored / Apple libc++ |
| [LLD][10] as linker | All | `-fuse-ld=lld` required for Thin LTO |
| Full RELRO + `-fno-plt` | Linux | `-z relro -z now` plus PLT-less call lowering |
| No semantic interposition | Linux | `-fno-semantic-interposition` for intra-DSO inlining |
| DT_RELR relative relocations | Linux | `-z pack-relative-relocs` shrinks PIE relocation tables |
| LLD ICF (safe) | Linux | `--icf=safe` folds identical address-not-taken functions |
| GNU-hash dyn-symbol table | Linux | `--hash-style=gnu` (smaller, ~2× faster lookup at load) |
| LLD link optimization | Linux | `-Wl,-O3` aggressive string/section merging at link time |
| Relative C++ ABI vtables | Linux | Enables relative vtable references program-wide |
| Linker-driven Reordering | Linux amd64 | Emits `-fbasic-block-sections=all` for reordering |
| Intel CET (IBT + shadow stk) | Linux amd64 | `-fcf-protection=full` |
| Propeller-ready BB sections | Linux amd64 | `-fbasic-block-sections=all` (awaiting symbol-ordering file) |
| Microarch tuning | Linux amd64 | `-mtune=znver3` (mirrors Rust `-Ztune-cpu=znver3`) |
| PAC + BTI | arm64 (both) | `-mbranch-protection=standard`; `-z force-bti` on Linux |
| Mach-O fixup chains | Darwin | `-bind_at_load`, `-fixup_chains`, `-no_data_in_code_info` |

### Phasing in

Some hardening flags are temporarily disabled in the live profiles to
prioritise startup time. They are tracked verbatim in
[`labs.disabled.txt`][19] with a re-enable plan per flag. Currently
parked, with the exact spelling needed when cut-and-pasting back into
the live profile:

- `-fstack-protector-strong`
- `-fzero-call-used-regs=used-gpr`
- `-fcf-protection=full`
- `-U_FORTIFY_SOURCE`
- `-D_FORTIFY_SOURCE=2`
- `-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST`

The file is **not** consumed by the rollup or by `cli/cflags.{sh,ts}`
— it is a parking lot, not a profile.

> [!IMPORTANT]
> **Section-stripping and similar "no further link" flags live in `*-bin`,
> not in the per-OS or per-arch compile profiles.** The compile profiles
Expand Down Expand Up @@ -132,6 +149,18 @@ export CFLAGS="$(bun run ./cli/cflags.ts linux arm64)"
export LDFLAGS="$(bun run ./cli/cflags.ts linux arm64 --bin)"
```

## CI

[`.github/workflows/ci.yml`](.github/workflows/ci.yml) runs on every push and PR; it resolves the flag set with both `cli/cflags.sh` and `cli/cflags.ts`, asserts they emit byte-identical output, and then compiles + links the [`tests/probe.c`](./tests/probe.c) translation unit against the full compile and binary profiles for every supported target. The matrix covers:

| Target | GitHub runner |
| ------ | ------------- |
| `linux-amd64` | `ubuntu-24.04` |
| `linux-arm64` | `ubuntu-24.04-arm` |
| `darwin-arm64` | `macos-14` |

(`darwin-amd64` is not covered: GitHub no longer offers an Intel-Mac runner. The profile is exercised locally before tagged releases.)

## Downstream Projects

So far, the following projects consume these flags:
Expand All @@ -158,3 +187,4 @@ So far, the following projects consume these flags:
[16]: ./darwin-bin.txt
[17]: ./cli/cflags.sh
[18]: ./cli/cflags.ts
[19]: ./labs.disabled.txt
24 changes: 13 additions & 11 deletions base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,26 +54,28 @@
# Prefer optimizing at level 3.
-O3

# Aggressive constant merging: dedupe equal string and numeric constant
# pool entries across translation units. Smaller binary, fewer
# relocations to apply at load — a small but free startup win. Mild
# ISO C/C++ violation: the standard guarantees distinct addresses for
# distinct string literals, but our code does not depend on that.
-fmerge-all-constants

# Position-independent code/executables. PIC is required for objects that
# may end up in shared libraries; PIE is required for the final executable
# to participate in ASLR. Including both covers objects destined for either.
-fPIC
-fPIE

# Hardening and unwind/debuggability flags:
# stack-protector-strong - insert canaries in functions with arrays or
# address-taken locals; balanced cost vs. -all.
# zero-call-used-regs=used-gpr- zero general-purpose registers that the function
# actually used before returning, shrinking the ROP/JOP
# gadget surface for spilled callee-saves and reducing
# secret leakage across call boundaries (clang 15+).
# Hardening, unwind, and debuggability flags:
# exceptions - emit unwind tables for all code so stack unwinding
# (C++ exceptions, Rust panics, foreign runtimes)
# can traverse C frames cleanly.
# asynchronous-unwind-tables - emit unwind info usable from arbitrary signal/
# sample points (profilers, perf, crash collectors),
# not just at call boundaries. Stronger than the
# plain `-funwind-tables` implied by -fexceptions.
# Kept on for now — better profiling > the table size.
# no-omit-frame-pointer - preserve %rbp/x29 for reliable backtraces in
# profilers, debuggers, and crash handlers.
# no-strict-aliasing - disable type-based alias analysis; matches the
Expand All @@ -83,10 +85,10 @@
# no-delete-null-pointer-checks - keep explicit null checks even after a prior deref;
# prevents silent removal of defensive code.
#
# NOTE: -fstack-clash-protection is intentionally NOT here — Apple's clang
# rejects it on darwin (no codegen support). It lives in linux.txt instead.
-fstack-protector-strong
-fzero-call-used-regs=used-gpr
# NOTE: -fstack-protector-strong and -fzero-call-used-regs=used-gpr are
# parked in labs.disabled.txt while we prioritise startup time.
# -fstack-clash-protection lives in linux.txt because Apple's clang
# rejects it on darwin (no codegen support).
-fexceptions
-fasynchronous-unwind-tables
-fno-omit-frame-pointer
Expand Down
21 changes: 7 additions & 14 deletions darwin.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,13 @@
# targets darwin (darwin / darwin-amd64 / darwin-arm64 profiles).
# Toolchain assumption: mainline LLVM clang 22 (not Apple-clang).

# Glibc-style fortified-source equivalent. Apple's libc honors
# _FORTIFY_SOURCE up to level 2 (level 3's __builtin_dynamic_object_size
# fortification depends on glibc-only chk variants). Level 2 still
# upgrades str*/mem*/sprintf into bounds-checked __*_chk forms when the
# destination size is constant-foldable.
-U_FORTIFY_SOURCE
-D_FORTIFY_SOURCE=2

# libc++ hardened mode: turns on bounds checks for std::vector, std::span,
# std::string_view, iterator validity, etc. The "fast" mode keeps the
# extension safety net cheap (mostly inlined comparisons) and matches what
# we get from -D_GLIBCXX_ASSERTIONS on Linux. libc++ is the default C++
# stdlib in Apple's SDK and the BoringSSL test binaries pick it up there.
-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST
# NOTE: -U_FORTIFY_SOURCE, -D_FORTIFY_SOURCE=2, and
# -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST are parked in
# labs.disabled.txt while we prioritise startup time across both
# platforms. They will be re-enabled in lock-step with the Linux
# profile once we have a perf bench that catches the per-call cost on
# the str/mem/iterator hot paths. Cut-and-paste the *exact* spellings
# from labs.disabled.txt back here when re-enabling.

# Force LLD as the linker. Apple's ld64 / ld-prime is the system default,
# but mainline clang-22 with ThinLTO produces bitcode archive members that
Expand Down
53 changes: 53 additions & 0 deletions labs.disabled.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Security and hardening flags we have *temporarily* disabled across the
# live profiles to optimise startup time. This file is NOT loaded by the
# rollup or by cli/cflags.{sh,ts}; the resolver matches on a closed set
# of filenames (base / $os / $os-$arch / $os-bin / $os-$arch-bin) and
# silently ignores everything else.
#
# Treat this file as a tracked, reviewable parking lot — every flag
# here is something we intend to phase back in as our perf budget grows.
# Re-enabling is a literal cut-and-paste back into the named source
# file plus a regression run on the BoringSSL TLS-handshake and Netty
# cold-startup benches.
#
# ── from base.txt ────────────────────────────────────────────────────
#
# Stack canaries on functions with arrays / address-taken locals.
# Cost: ~3-5% codesize growth + per-call canary load + epilogue check
# on every protected function. Re-enable when we have a benchmark
# harness that catches the cost on the BoringSSL handshake hot path.
-fstack-protector-strong
#
# Zero general-purpose registers on function return; shrinks the
# ROP/JOP gadget surface for spilled callee-saves and reduces secret
# leakage across call boundaries. Cost: per-return reg-clearing
# instructions, larger code. Re-enable alongside -fcf-protection=full
# (below) once we deploy on a CET-capable floor and want defense-in-
# depth on indirect-call edges.
-fzero-call-used-regs=used-gpr
#
# ── from linux.txt ───────────────────────────────────────────────────
#
# Glibc/musl fortified-source level 2: replaces selected libc calls
# (str*, mem*, sprintf family, read/write) with __*_chk variants that
# bounds-check at runtime when the destination size is constant-
# foldable. Cost: per-call bounds checks on libc hot paths. Re-enable
# after we have a perf bench that catches the cost on JNI/Netty I/O
# and on BoringSSL's str/mem-heavy routines.
-U_FORTIFY_SOURCE
-D_FORTIFY_SOURCE=2
#
# libc++ hardened-fast mode: bounds checks on std::vector, std::span,
# std::string_view, iterator validity. Cost: per-access checks on the
# libc++ hot paths the BoringSSL test binaries exercise. Re-enable
# alongside _FORTIFY_SOURCE above.
-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST
#
# ── from linux-amd64.txt ─────────────────────────────────────────────
#
# Intel CET: Indirect Branch Tracking (IBT) + shadow stack (SHSTK).
# Cost: ENDBR64 prologue on every indirect-call target, plus shadow-
# stack push/pop on call/return when the kernel honors it. Re-enable
# when the linux-amd64 deployment floor moves to Tiger Lake+ / Zen 4+
# and the userspace runtime contract is ready to require CET support.
-fcf-protection=full
28 changes: 15 additions & 13 deletions linux-amd64.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,18 +39,20 @@
# Mirrors the Rust profile's -Ztune-cpu=znver3.
-mtune=znver3

# Intel CET — Indirect Branch Tracking + shadow stack. `=full` enables
# both -mibt and -mshstk. Cheap on hardware that supports it (Tiger Lake+
# / Zen 4+); silently transparent on older CPUs because the ENDBR64 prefix
# decodes as a NOP, and the shadow stack is only consulted when the kernel
# advertises CET support. Linux 5.18+ wires the userspace bits up via
# arch_prctl. Pairs with the assembler emitting NT_GNU_PROPERTY_TYPE_0
# notes so the loader can verify the binary opted in.
-fcf-protection=full
# NOTE: -fcf-protection=full (Intel CET / IBT + SHSTK) is parked in
# labs.disabled.txt while we prioritise startup time. Re-enable when the
# linux-amd64 deployment floor moves to Tiger Lake+ / Zen 4+ and the
# userspace runtime contract is ready to require CET support.

# Emit per-basic-block sections so the linker can reorder/discard at finer
# granularity than per-function. clang's `=all` form is gated to x86 ELF;
# AArch64 clang on Alpine rejects `all` ('invalid value all') even though
# it accepts `=labels`. Darwin's clang doesn't implement BB sections at
# all. Pinning this to linux-amd64 avoids both failure modes.
# Emit per-basic-block sections so a future Propeller pass can feed a
# symbol-ordering file into the linker and reorder hot blocks across
# function boundaries. Without that ordering file the linker keeps
# blocks in source order and the flag is overhead, but the per-BB
# sections are a prerequisite for Propeller, so we eat the cost now to
# avoid a rebuild-the-world later.
#
# clang's `=all` form is gated to x86 ELF; AArch64 clang on Alpine
# rejects `all` ('invalid value all') even though it accepts `=labels`.
# Darwin's clang doesn't implement BB sections at all. Pinning this to
# linux-amd64 avoids both failure modes.
-fbasic-block-sections=all
Loading
Loading