Incorporate learnings from Daily macOS (ObjC + CocoaPods) test case#9
Incorporate learnings from Daily macOS (ObjC + CocoaPods) test case#9
Conversation
A community member ran the optimization skills on an older ObjC + CocoaPods project. The agent improvised CocoaPods-specific changes not covered by any reference material, overestimated how actionable some findings were, and produced a 14% regression. This commit addresses the root causes: - Add actionability classification (repo-local / package-manager / xcode-behavior / upstream) to the recommendation format so every finding states how fixable it is from the repo - Add CocoaPods deprecation notice: do not attempt CocoaPods-specific optimizations; recommend migrating to SPM instead - Add regression-aware revert guidance: best-practice settings are never revert candidates; speculative changes are only reverted when all build types regress; compilation caching trade-offs are evaluated holistically - Add structured execution report to the fixer and orchestrator template so each change is tracked with its measured result and kept/reverted status - Add benchmark confidence checks: flag high-variance runs and recommend additional repetitions before drawing conclusions - Clarify ExtractAppIntentsMetadata is Xcode-driven and not suppressible from repo-local settings
There was a problem hiding this comment.
Pull request overview
Updates the Xcode build optimization Agent Skills documentation to incorporate lessons from a CocoaPods-based macOS case study, aiming to improve recommendation actionability, avoid unreliable CocoaPods “optimizations,” and strengthen regression/noise handling and reporting structure.
Changes:
- Add an
actionabilityclassification to the shared recommendation format and clarify how to classify Xcode-driven costs (e.g.,ExtractAppIntentsMetadata). - Add CocoaPods detection guidance that recommends SPM migration and explicitly avoids CocoaPods-specific tuning.
- Introduce an “Execution Report” structure and add benchmark variance (“confidence”) guidance for interpreting noisy results and regressions.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/xcode-project-analyzer/references/project-audit-checks.md | Clarifies ExtractAppIntentsMetadata as xcode-behavior and adds CocoaPods guidance (avoid CocoaPods tuning; prefer SPM migration). |
| skills/xcode-project-analyzer/SKILL.md | Adds CocoaPods usage to the “What To Review” checklist. |
| skills/xcode-build-orchestrator/references/orchestration-report-template.md | Replaces “Verification” with an “Execution Report” section and a per-change tracking table. |
| skills/xcode-build-orchestrator/SKILL.md | Adds a benchmark variance check and strengthens noise/confidence guidance in the final report section. |
| skills/xcode-build-fixer/SKILL.md | Adds regression evaluation rules (best-practice settings not revert candidates) and an execution report structure. |
| references/recommendation-format.md | Makes actionability required and defines allowed values and rendering order. |
| README.md | Updates the documented repository/skill tree structure block. |
| OPTIMIZATION-CHECKS.md | Adds CocoaPods guidance and clarifies ExtractAppIntentsMetadata actionability as xcode-behavior. |
README.md
Outdated
| ```text | ||
| skills/ | ||
| xcode-build-benchmark/ | ||
| SKILL.md | ||
| references/ | ||
| benchmarking-workflow.md | ||
| xcode-compilation-analyzer/ | ||
| SKILL.md | ||
| references/ | ||
| code-compilation-checks.md | ||
| xcode-project-analyzer/ | ||
| SKILL.md | ||
| references/ | ||
| project-audit-checks.md | ||
| spm-build-analysis/ | ||
| SKILL.md | ||
| references/ | ||
| spm-analysis-checks.md | ||
| xcode-build-orchestrator/ | ||
| SKILL.md | ||
| references/ | ||
| orchestration-report-template.md | ||
| xcode-build-fixer/ | ||
| SKILL.md | ||
| references/ | ||
| fix-patterns.md | ||
| xcode-build-optimization-agent-skill/ | ||
| .claude-plugin/ | ||
| marketplace.json | ||
| plugin.json | ||
| references/ |
There was a problem hiding this comment.
The skill tree example hard-codes the repo root as xcode-build-optimization-agent-skill/, but the checkout directory name is user-controlled (and the README itself is already at repo root). To keep the structure example universally correct, consider using . / <repo-root>/ instead of a specific folder name.
| Each recommendation should include: | ||
|
|
||
| - `title` | ||
| - `wait_time_impact` -- plain-language statement of expected wall-clock impact, e.g. "Expected to reduce your clean build by ~3s", "Reduces parallel compile work but unlikely to reduce build wait time", or "Impact on wait time is uncertain -- re-benchmark to confirm" | ||
| - `actionability` -- classifies how fixable the issue is from the project (see values below) | ||
| - `category` | ||
| - `observed_evidence` | ||
| - `estimated_impact` | ||
| - `confidence` | ||
| - `approval_required` | ||
| - `benchmark_verification_status` |
There was a problem hiding this comment.
With actionability now required (and wait_time_impact already required), the repo’s Markdown renderers/generators should be updated to emit and render these fields in the specified order. For example, scripts/render_recommendations.py currently omits both wait_time_impact and actionability, and scripts/generate_optimization_report.py doesn’t handle actionability either, so the produced reports won’t match this documented contract.
| ## Execution Report (post-approval) | ||
|
|
||
| ### Baseline | ||
| - Clean build median: X.Xs | ||
| - Cached clean build median: X.Xs (if applicable) | ||
| - Incremental build median: X.Xs | ||
|
|
There was a problem hiding this comment.
This template states generate_optimization_report.py produces this format automatically, but the script currently doesn’t emit any post-approval “Execution Report” placeholder/section. If this section is now expected in the generated plan, the generator should be updated to include it (even as an empty placeholder) so Phase 2 can append results without manual restructuring.
| 1. Collect the build target context: workspace or project, scheme, configuration, destination, and current pain point. When both `.xcworkspace` and `.xcodeproj` exist, prefer `.xcodeproj` unless the workspace contains sub-projects required for the build. Workspaces that reference external projects may fail if those projects are not checked out. | ||
| 2. Run `xcode-build-benchmark` to establish a baseline if no fresh benchmark exists. The benchmark script auto-detects `COMPILATION_CACHING = YES` and includes cached clean builds that measure the realistic developer experience (warm cache). If the build fails to compile, check `git log` for a recent buildable commit. When working in a worktree, cherry-picking a targeted build fix from a feature branch is acceptable to reach a buildable state. If SPM packages reference gitignored directories in their `exclude:` paths (e.g., `__Snapshots__`), create those directories before building -- worktrees do not contain gitignored content and `xcodebuild -resolvePackageDependencies` will crash otherwise. | ||
| 3. Verify the benchmark artifact has non-empty `timing_summary_categories`. If empty, the timing summary parser may have failed -- re-parse the raw logs or inspect them manually. If `COMPILATION_CACHING` is enabled, also verify the artifact includes `cached_clean` runs. | ||
| 3b. **Benchmark confidence check**: For each build type (clean, cached clean, incremental), compare the min and max values. If the spread (max - min) exceeds 20% of the median, flag the benchmark as having high variance and recommend running additional repetitions (5+ runs) before drawing conclusions. High variance makes it difficult to distinguish real improvements from noise. After applying changes, only claim an improvement if the post-change median falls outside the baseline's min-max range. |
There was a problem hiding this comment.
3b. is not valid Markdown ordered-list syntax, so it likely won’t render as part of the numbered workflow (and may break copy/paste of steps). Consider keeping step 3 as an ordered-list item and making this a nested sub-bullet (e.g., 3.a / 3.b as plain text within a nested list) or renumbering as step 4 with the subsequent steps shifted.
| 3b. **Benchmark confidence check**: For each build type (clean, cached clean, incremental), compare the min and max values. If the spread (max - min) exceeds 20% of the median, flag the benchmark as having high variance and recommend running additional repetitions (5+ runs) before drawing conclusions. High variance makes it difficult to distinguish real improvements from noise. After applying changes, only claim an improvement if the post-change median falls outside the baseline's min-max range. | |
| - **Benchmark confidence check**: For each build type (clean, cached clean, incremental), compare the min and max values. If the spread (max - min) exceeds 20% of the median, flag the benchmark as having high variance and recommend running additional repetitions (5+ runs) before drawing conclusions. High variance makes it difficult to distinguish real improvements from noise. After applying changes, only claim an improvement if the post-change median falls outside the baseline's min-max range. |
| @@ -26,6 +26,7 @@ Use this skill for project- and target-level build inefficiencies that are unlik | |||
| - asset catalog compilation time, especially in targets with large or numerous catalogs | |||
| - `ExtractAppIntentsMetadata` time in the Build Timing Summary -- if the project does not use App Intents but this phase consumes significant time, flag it as unnecessary overhead | |||
There was a problem hiding this comment.
The guidance for ExtractAppIntentsMetadata here says to “flag it as unnecessary overhead”, but the updated audit checklist now clarifies it’s typically not cleanly suppressible and should be classified as xcode-behavior (report cost, don’t promise a repo-local fix). This section should be aligned so the skill doesn’t imply there’s an actionable repo-local optimization.
| - `ExtractAppIntentsMetadata` time in the Build Timing Summary -- if the project does not use App Intents but this phase consumes significant time, flag it as unnecessary overhead | |
| - `ExtractAppIntentsMetadata` time in the Build Timing Summary -- if this phase consumes significant time, record it as fixed-cost `xcode-behavior` (report the cost and impact, but do not suggest a repo-local optimization unless there is explicit Apple guidance) |
| When rendering for human review, preserve the same field order: | ||
|
|
||
| 1. title | ||
| 2. wait-time impact | ||
| 3. observed evidence | ||
| 4. estimated impact | ||
| 5. confidence | ||
| 6. approval required | ||
| 7. benchmark verification status | ||
| 3. actionability | ||
| 4. observed evidence | ||
| 5. estimated impact | ||
| 6. confidence | ||
| 7. approval required | ||
| 8. benchmark verification status |
There was a problem hiding this comment.
In the “Markdown Rendering Guidance” ordered list, the field is named wait_time_impact earlier, but the list item says “wait-time impact”. For consistency and to avoid implementers mistaking this for a different field name, consider rendering it as wait_time_impact (and likewise use backticks for the other field names in this list).
- Fix 3b. to nested sub-bullet for valid Markdown list syntax - Align ExtractAppIntentsMetadata wording in xcode-project-analyzer to match the xcode-behavior classification from the audit checklist - Add actionability field handling to generate_optimization_report.py in both recommendation rendering and approval checklist
The sync-readme CI job expanded the tree to include the entire repo. Revert to the clean skills-only view matching main.
The script was hard-coding the entire repo structure (references, schemas, scripts, .claude-plugin) into the README skill tree. Simplified to only output the skills/ directory, matching the intended structure.
Summary
A community member ran the optimization skills on an older Objective-C + CocoaPods macOS project (Daily macOS). The agent improvised CocoaPods-specific changes not covered by any skill reference, overestimated how actionable some findings were, and the cumulative optimization pass regressed builds by 14%. This PR addresses the root causes identified from that test case.
actionabilityfield (repo-local,package-manager,xcode-behavior,upstream) so the agent must classify how fixable a finding is before presenting itTest plan
references/recommendation-format.mdincludes the newactionabilityfield in required fields, JSON example, and rendering orderOPTIMIZATION-CHECKS.mdhas CocoaPods deprecation section and updated ExtractAppIntentsMetadata descriptionskills/xcode-build-fixer/SKILL.mdhas Evaluating Regressions section (with best-practice settings list) and Execution Report sectionskills/xcode-build-orchestrator/SKILL.mdhas benchmark confidence check (step 3b) and expanded noise guidance in Final Reportskills/xcode-project-analyzer/SKILL.mdand itsreferences/project-audit-checks.mdhave CocoaPods detection and ExtractAppIntentsMetadata updatesorchestration-report-template.mdhas the new Execution Report section replacing the old Verification section