AvdLee · AvdLee · Mar 23, 2026 · Mar 23, 2026 · Mar 23, 2026
diff --git a/README.md b/README.md
@@ -154,7 +154,6 @@ xcode-build-optimization-agent-skill/
     build-benchmark.schema.json
   scripts/
     benchmark_builds.py
-    check_spm_pins.py
     diagnose_compilation.py
     generate_optimization_report.py
     render_recommendations.py
@@ -217,8 +216,8 @@ Real-world improvements reported by developers who used these skills. Add your o
 
 The `xcode-build-orchestrator` generates your table row at the end of every optimization run, so contributing is a single copy-paste.
 
-| App | Incremental Before | Incremental After | Clean Before | Clean After |
-|-----|-------------------:|------------------:|-------------:|------------:|
+| App | Incremental Before | Incremental After | Clean Before | Clean After | Cached Clean Before | Cached Clean After |
+|-----|-------------------:|------------------:|-------------:|------------:|--------------------:|-------------------:|
 
 ## Contributing
 

diff --git a/references/benchmark-artifacts.md b/references/benchmark-artifacts.md
@@ -22,6 +22,9 @@ Recommended outputs:
 - `.build-benchmark/<timestamp>-<scheme>-clean-1.log`
 - `.build-benchmark/<timestamp>-<scheme>-clean-2.log`
 - `.build-benchmark/<timestamp>-<scheme>-clean-3.log`
+- `.build-benchmark/<timestamp>-<scheme>-cached-clean-1.log` (when COMPILATION_CACHING is enabled)
+- `.build-benchmark/<timestamp>-<scheme>-cached-clean-2.log`
+- `.build-benchmark/<timestamp>-<scheme>-cached-clean-3.log`
 - `.build-benchmark/<timestamp>-<scheme>-incremental-1.log`
 - `.build-benchmark/<timestamp>-<scheme>-incremental-2.log`
 - `.build-benchmark/<timestamp>-<scheme>-incremental-3.log`
@@ -42,12 +45,13 @@ Each JSON artifact should include:
 - parsed timing-summary categories
 - free-form notes for caveats or noise
 
-## Clean And Incremental Separation
+## Clean, Cached Clean, And Incremental Separation
 
-Do not merge clean and incremental measurements into a single list. They answer different questions:
+Do not merge different build type measurements into a single list. They answer different questions:
 
-- Clean builds show full build-system, package, and module setup cost.
-- Incremental builds show edit-loop productivity and script or cache invalidation problems.
+- **Clean builds** show full build-system, package, and module setup cost with a cold compilation cache.
+- **Cached clean builds** show clean build cost when the compilation cache is warm. This is the realistic scenario for branch switching, pulling changes, or Clean Build Folder. Only present when `COMPILATION_CACHING = YES` is detected.
+- **Incremental builds** show edit-loop productivity and script or cache invalidation problems.
 
 ## Raw Logs
 
@@ -62,14 +66,17 @@ Store raw `xcodebuild` output beside the JSON artifact whenever possible. That a
 
 ### COMPILATION_CACHING
 
-`COMPILATION_CACHING = YES` stores compiled artifacts so that repeated compilations of identical inputs are served from cache. The standard benchmark methodology (clean + build) clears derived data before each clean run, which invalidates the compilation cache. As a result, the benchmark script does not capture the benefit of compilation caching.
+`COMPILATION_CACHING = YES` stores compiled artifacts in a system-managed cache outside DerivedData so that repeated compilations of identical inputs are served from cache. The standard clean-build benchmark (`xcodebuild clean` between runs) may add overhead from cache population without showing the corresponding cache-hit benefit.
 
-The real benefit of compilation caching appears during:
+The benchmark script automatically detects `COMPILATION_CACHING = YES` and runs a **cached clean** benchmark phase. This phase:
 
-- Repeat clean builds where source files have not changed (e.g., after switching branches and switching back).
-- CI builds that share a persistent derived-data directory across runs.
+1. Builds once to warm the compilation cache.
+2. Deletes DerivedData (but not the compilation cache) before each measured run.
+3. Rebuilds, measuring the cache-hit clean build time.
 
-When reporting on COMPILATION_CACHING, note that the standard clean-build benchmark cannot measure its impact. Recommend enabling it based on the well-documented benefit rather than requiring a measurable delta from the benchmark script.
+The cached clean metric captures the realistic developer experience: branch switching, pulling changes, and Clean Build Folder. Use the cached clean median as the primary comparison metric when evaluating `COMPILATION_CACHING` impact.
+
+To skip this phase, pass `--no-cached-clean`.
 
 ### First-Run Variance
 

diff --git a/references/build-settings-best-practices.md b/references/build-settings-best-practices.md
@@ -131,6 +131,7 @@ These settings optimize for production builds.
 - **Key:** `COMPILATION_CACHING`
 - **Recommended:** `YES`
 - **Why:** Caches compilation results for Swift and C-family sources so repeated compilations of the same inputs are served from cache. The biggest wins come from branch switching and clean builds where source files are recompiled unchanged. This is an opt-in feature. The umbrella setting controls both `SWIFT_ENABLE_COMPILE_CACHE` and `CLANG_ENABLE_COMPILE_CACHE` under the hood; those can be toggled independently if needed.
+- **Measurement:** The benchmark script auto-detects this setting and runs a **cached clean** phase that measures clean builds with a warm compilation cache. Standard clean builds may show overhead from cache population; the cached clean metric captures the realistic developer benefit.
 - **Risk:** Low -- can also be enabled via per-user project settings so it does not need to be committed to the shared project file.
 
 ### Integrated Swift Driver

diff --git a/schemas/build-benchmark.schema.json b/schemas/build-benchmark.schema.json
@@ -12,7 +12,7 @@
   "properties": {
     "schema_version": {
       "type": "string",
-      "enum": ["1.0.0", "1.1.0"]
+      "enum": ["1.0.0", "1.1.0", "1.2.0"]
     },
     "created_at": {
       "type": "string",
@@ -84,6 +84,12 @@
             "$ref": "#/definitions/run"
           }
         },
+        "cached_clean": {
+          "type": "array",
+          "items": {
+            "$ref": "#/definitions/run"
+          }
+        },
         "incremental": {
           "type": "array",
           "items": {
@@ -103,6 +109,9 @@
         "clean": {
           "$ref": "#/definitions/stats"
         },
+        "cached_clean": {
+          "$ref": "#/definitions/stats"
+        },
         "incremental": {
           "$ref": "#/definitions/stats"
         }
@@ -134,6 +143,7 @@
           "type": "string",
           "enum": [
             "clean",
+            "cached-clean",
             "incremental"
           ]
         },

diff --git a/scripts/benchmark_builds.py b/scripts/benchmark_builds.py
@@ -5,9 +5,11 @@
 import os
 import platform
 import re
+import shutil
 import statistics
 import subprocess
 import sys
+import tempfile
 import time
 from datetime import datetime, timezone
 from pathlib import Path
@@ -31,6 +33,11 @@ def parse_args() -> argparse.Namespace:
         help="Path to a source file to touch before each incremental build. "
         "When provided, measures a real edit-rebuild loop instead of a zero-change build.",
     )
+    parser.add_argument(
+        "--no-cached-clean",
+        action="store_true",
+        help="Skip cached clean builds even when COMPILATION_CACHING is detected.",
+    )
     parser.add_argument(
         "--extra-arg",
         action="append",
@@ -134,6 +141,19 @@ def xcode_version() -> str:
     return result.stdout.strip() if result.returncode == 0 else "unknown"
 
 
+def detect_compilation_caching(base_command: List[str]) -> bool:
+    """Check whether COMPILATION_CACHING is enabled in the resolved build settings."""
+    result = run_command([*base_command, "-showBuildSettings"])
+    if result.returncode != 0:
+        return False
+    for line in result.stdout.splitlines():
+        stripped = line.strip()
+        if stripped.startswith("COMPILATION_CACHING") and "=" in stripped:
+            value = stripped.split("=", 1)[1].strip()
+            return value == "YES"
+    return False
+
+
 def measure_build(
     base_command: List[str],
     artifact_stem: str,
@@ -173,8 +193,6 @@ def main() -> int:
         if warmup.returncode != 0:
           sys.stderr.write(warmup.stdout + warmup.stderr)
           return warmup.returncode
-        # Warmup clean+build cycle primes OS-level caches (disk, dyld, etc.)
-        # so the first measured clean run is not penalised by cold caches.
         warmup_clean = run_command([*base_command, "clean"])
         if warmup_clean.returncode != 0:
             sys.stderr.write(warmup_clean.stdout + warmup_clean.stderr)
@@ -184,7 +202,7 @@ def main() -> int:
             sys.stderr.write(warmup_rebuild.stdout + warmup_rebuild.stderr)
             return warmup_rebuild.returncode
 
-    runs = {"clean": [], "incremental": []}
+    runs: Dict[str, list] = {"clean": [], "incremental": []}
 
     for index in range(1, args.repeats + 1):
         clean_result = run_command([*base_command, "clean"])
@@ -195,6 +213,38 @@ def main() -> int:
             return clean_result.returncode
         runs["clean"].append(measure_build(base_command, artifact_stem, output_dir, "clean", index))
 
+    # --- Cached clean builds ---------------------------------------------------
+    # When COMPILATION_CACHING is enabled, the compilation cache lives outside
+    # DerivedData and survives product deletion.  We measure "cached clean"
+    # builds by pointing DerivedData at a temp directory, warming the cache with
+    # one build, then deleting the DerivedData directory (but not the cache)
+    # before each measured rebuild.  This captures the realistic scenario:
+    # branch switching, pulling changes, or Clean Build Folder.
+    should_cached_clean = not args.no_cached_clean and detect_compilation_caching(base_command)
+    if should_cached_clean:
+        dd_path = Path(args.derived_data_path) if args.derived_data_path else Path(
+            tempfile.mkdtemp(prefix="xcode-bench-dd-")
+        )
+        cached_cmd = list(base_command)
+        if not args.derived_data_path:
+            cached_cmd.extend(["-derivedDataPath", str(dd_path)])
+
+        cache_warmup = run_command([*cached_cmd, "build"])
+        if cache_warmup.returncode != 0:
+            sys.stderr.write("Warning: cached clean warmup build failed, skipping cached clean benchmarks.\n")
+            sys.stderr.write(cache_warmup.stdout + cache_warmup.stderr)
+            should_cached_clean = False
+
+    if should_cached_clean:
+        runs["cached_clean"] = []
+        for index in range(1, args.repeats + 1):
+            shutil.rmtree(dd_path, ignore_errors=True)
+            runs["cached_clean"].append(
+                measure_build(cached_cmd, artifact_stem, output_dir, "cached-clean", index)
+            )
+        shutil.rmtree(dd_path, ignore_errors=True)
+
+    # --- Incremental / zero-change builds --------------------------------------
     incremental_label = "incremental"
     if args.touch_file:
         touch_path = Path(args.touch_file)
@@ -212,8 +262,15 @@ def main() -> int:
             measure_build(base_command, artifact_stem, output_dir, incremental_label, index)
         )
 
+    summary: Dict[str, object] = {
+        "clean": stats_for(runs["clean"]),
+        "incremental": stats_for(runs["incremental"]),
+    }
+    if "cached_clean" in runs:
+        summary["cached_clean"] = stats_for(runs["cached_clean"])
+
     artifact = {
-        "schema_version": "1.1.0",
+        "schema_version": "1.2.0" if "cached_clean" in runs else "1.1.0",
         "created_at": datetime.now(timezone.utc).isoformat(),
         "build": {
             "entrypoint": "workspace" if args.workspace else "project",
@@ -231,10 +288,7 @@ def main() -> int:
             "cwd": os.getcwd(),
         },
         "runs": runs,
-        "summary": {
-            "clean": stats_for(runs["clean"]),
-            "incremental": stats_for(runs["incremental"]),
-        },
+        "summary": summary,
         "notes": [f"touch-file: {args.touch_file}"] if args.touch_file else [],
     }
 
@@ -243,6 +297,8 @@ def main() -> int:
 
     print(f"Saved benchmark artifact: {artifact_path}")
     print(f"Clean median: {artifact['summary']['clean']['median_seconds']}s")
+    if "cached_clean" in artifact["summary"]:
+        print(f"Cached clean median: {artifact['summary']['cached_clean']['median_seconds']}s")
     inc_label = "Incremental" if args.touch_file else "Zero-change"
     print(f"{inc_label} median: {artifact['summary']['incremental']['median_seconds']}s")
     return 0

diff --git a/scripts/generate_optimization_report.py b/scripts/generate_optimization_report.py
@@ -268,18 +268,40 @@ def _section_context(benchmark: Dict[str, Any]) -> str:
 def _section_baseline(benchmark: Dict[str, Any]) -> str:
     summary = benchmark.get("summary", {})
     clean = summary.get("clean", {})
+    cached_clean = summary.get("cached_clean", {})
     incremental = summary.get("incremental", {})
-    lines = [
-        "## Baseline Benchmarks\n",
-        f"| Metric | Clean | Incremental |",
-        f"|--------|-------|-------------|",
-        f"| Median | {clean.get('median_seconds', 0):.3f}s | {incremental.get('median_seconds', 0):.3f}s |",
-        f"| Min | {clean.get('min_seconds', 0):.3f}s | {incremental.get('min_seconds', 0):.3f}s |",
-        f"| Max | {clean.get('max_seconds', 0):.3f}s | {incremental.get('max_seconds', 0):.3f}s |",
-        f"| Runs | {clean.get('count', 0)} | {incremental.get('count', 0)} |",
-    ]
-
-    for build_type in ("clean", "incremental"):
+    has_cached = bool(cached_clean and cached_clean.get("count", 0) > 0)
+
+    if has_cached:
+        lines = [
+            "## Baseline Benchmarks\n",
+            "| Metric | Clean | Cached Clean | Incremental |",
+            "|--------|-------|-------------|-------------|",
+            f"| Median | {clean.get('median_seconds', 0):.3f}s | {cached_clean.get('median_seconds', 0):.3f}s | {incremental.get('median_seconds', 0):.3f}s |",
+            f"| Min | {clean.get('min_seconds', 0):.3f}s | {cached_clean.get('min_seconds', 0):.3f}s | {incremental.get('min_seconds', 0):.3f}s |",
+            f"| Max | {clean.get('max_seconds', 0):.3f}s | {cached_clean.get('max_seconds', 0):.3f}s | {incremental.get('max_seconds', 0):.3f}s |",
+            f"| Runs | {clean.get('count', 0)} | {cached_clean.get('count', 0)} | {incremental.get('count', 0)} |",
+        ]
+        lines.append(
+            "\n> **Cached Clean** = clean build with a warm compilation cache. "
+            "This is the realistic scenario for branch switching, pulling changes, or "
+            "Clean Build Folder. The compilation cache lives outside DerivedData and "
+            "survives product deletion.\n"
+        )
+    else:
+        lines = [
+            "## Baseline Benchmarks\n",
+            "| Metric | Clean | Incremental |",
+            "|--------|-------|-------------|",
+            f"| Median | {clean.get('median_seconds', 0):.3f}s | {incremental.get('median_seconds', 0):.3f}s |",
+            f"| Min | {clean.get('min_seconds', 0):.3f}s | {incremental.get('min_seconds', 0):.3f}s |",
+            f"| Max | {clean.get('max_seconds', 0):.3f}s | {incremental.get('max_seconds', 0):.3f}s |",
+            f"| Runs | {clean.get('count', 0)} | {incremental.get('count', 0)} |",
+        ]
+
+    build_types = ["clean", "cached_clean", "incremental"] if has_cached else ["clean", "incremental"]
+    label_map = {"clean": "Clean", "cached_clean": "Cached Clean", "incremental": "Incremental"}
+    for build_type in build_types:
         runs = benchmark.get("runs", {}).get(build_type, [])
         all_cats: Dict[str, Dict] = {}
         for run in runs:
@@ -292,7 +314,8 @@ def _section_baseline(benchmark: Dict[str, Any]) -> str:
         if all_cats:
             count = len(runs) or 1
             ranked = sorted(all_cats.items(), key=lambda x: x[1]["seconds"], reverse=True)
-            lines.append(f"\n### {build_type.title()} Build Timing Summary\n")
+            label = label_map.get(build_type, build_type.title())
+            lines.append(f"\n### {label} Build Timing Summary\n")
             lines.append(
                 "> **Note:** These are aggregated task times across all CPU cores. "
                 "Because Xcode runs many tasks in parallel, these totals typically exceed "

diff --git a/skills/xcode-build-benchmark/SKILL.md b/skills/xcode-build-benchmark/SKILL.md
@@ -37,10 +37,11 @@ When benchmarking inside a git worktree, SPM packages with `exclude:` paths that
 1. Normalize the build command and note every flag that affects caching or module reuse.
 2. Run one warm-up build if needed to validate that the command succeeds.
 3. Run 3 clean builds.
-4. Run 3 zero-change builds (build immediately after a successful build with no edits). This measures the fixed overhead floor: dependency computation, project description transfer, build description creation, script phases, codesigning, and validation. A zero-change build that takes more than a few seconds indicates avoidable per-build overhead. Use the default `benchmark_builds.py` invocation (no `--touch-file` flag).
-5. Optionally run 3 incremental builds with a file touch to measure a real edit-rebuild loop. Use `--touch-file path/to/SomeFile.swift` to touch a representative source file before each build.
-6. Save the raw results and summary into `.build-benchmark/`.
-7. Report medians and spread, not just the single fastest run.
+4. If `COMPILATION_CACHING = YES` is detected, run 3 cached clean builds. These measure clean build time with a warm compilation cache -- the realistic scenario for branch switching, pulling changes, or Clean Build Folder. The script handles this automatically by building once to warm the cache, then deleting DerivedData (but not the compilation cache) before each measured run. Pass `--no-cached-clean` to skip.
+5. Run 3 zero-change builds (build immediately after a successful build with no edits). This measures the fixed overhead floor: dependency computation, project description transfer, build description creation, script phases, codesigning, and validation. A zero-change build that takes more than a few seconds indicates avoidable per-build overhead. Use the default `benchmark_builds.py` invocation (no `--touch-file` flag).
+6. Optionally run 3 incremental builds with a file touch to measure a real edit-rebuild loop. Use `--touch-file path/to/SomeFile.swift` to touch a representative source file before each build.
+7. Save the raw results and summary into `.build-benchmark/`.
+8. Report medians and spread, not just the single fastest run.
 
 ## Preferred Command Path
 
@@ -62,6 +63,7 @@ If you cannot use the helper script, run equivalent `xcodebuild` commands with `
 Return:
 
 - clean build median, min, max
+- cached clean build median, min, max (when COMPILATION_CACHING is enabled)
 - zero-change build median, min, max (fixed overhead floor)
 - incremental build median, min, max (if `--touch-file` was used)
 - biggest timing-summary categories

diff --git a/skills/xcode-build-fixer/SKILL.md b/skills/xcode-build-fixer/SKILL.md
@@ -122,7 +122,7 @@ If a fix produced no measurable wall-time improvement, note `No measurable wall-
 
 For changes valuable for non-benchmark reasons (deterministic package resolution, branch-switch caching), label them: "No wait-time improvement expected from this change. The benefit is [deterministic builds / faster branch switching / reduced CI cost]."
 
-Note: `COMPILATION_CACHING` improvements cannot be captured by the standard clean-build benchmark because `xcodebuild clean` invalidates the cache between runs. When reporting on this setting, note that the benefit is real but requires a different measurement approach (e.g., branch-switch benchmarks or repeat builds without cleaning). Recommend keeping the setting enabled based on documented benefit rather than requiring a delta from the benchmark.
+Note: `COMPILATION_CACHING` improvements are captured by the **cached clean** benchmark phase, which the benchmark script runs automatically when it detects the setting. Cached clean builds measure clean build time with a warm compilation cache -- the realistic scenario for branch switching and pulling changes. Standard clean builds may show overhead from cache population; use the cached clean metric as the primary comparison for this setting.
 
 ## Escalation