Skip to content

Incorporate learnings from Daily macOS (ObjC + CocoaPods) test case#9

Merged
AvdLee merged 6 commits intomainfrom
learnings-daily-macos-cocoapods
Mar 25, 2026
Merged

Incorporate learnings from Daily macOS (ObjC + CocoaPods) test case#9
AvdLee merged 6 commits intomainfrom
learnings-daily-macos-cocoapods

Conversation

@AvdLee
Copy link
Copy Markdown
Owner

@AvdLee AvdLee commented Mar 24, 2026

Summary

A community member ran the optimization skills on an older Objective-C + CocoaPods macOS project (Daily macOS). The agent improvised CocoaPods-specific changes not covered by any skill reference, overestimated how actionable some findings were, and the cumulative optimization pass regressed builds by 14%. This PR addresses the root causes identified from that test case.

  • Actionability classification: Every recommendation now requires an actionability field (repo-local, package-manager, xcode-behavior, upstream) so the agent must classify how fixable a finding is before presenting it
  • CocoaPods deprecation: Skills now detect CocoaPods and recommend migrating to SPM instead of attempting CocoaPods-specific optimizations (which are unreliable and frequently regress)
  • Regression-aware revert guidance: Best-practice build settings (compilation caching, eager linking, etc.) are never revert candidates; speculative changes are only reverted when all build types regress; compilation caching trade-offs are evaluated holistically
  • Structured execution report: The fixer now produces a per-change report tracking what was attempted, measured, and kept/reverted/blocked
  • Benchmark confidence: High-variance benchmarks are flagged with a recommendation for additional repetitions before drawing conclusions
  • ExtractAppIntentsMetadata: Clarified as Xcode-driven behavior that cannot be suppressed from repo-local settings

Test plan

  • Verify references/recommendation-format.md includes the new actionability field in required fields, JSON example, and rendering order
  • Verify OPTIMIZATION-CHECKS.md has CocoaPods deprecation section and updated ExtractAppIntentsMetadata description
  • Verify skills/xcode-build-fixer/SKILL.md has Evaluating Regressions section (with best-practice settings list) and Execution Report section
  • Verify skills/xcode-build-orchestrator/SKILL.md has benchmark confidence check (step 3b) and expanded noise guidance in Final Report
  • Verify skills/xcode-project-analyzer/SKILL.md and its references/project-audit-checks.md have CocoaPods detection and ExtractAppIntentsMetadata updates
  • Verify orchestration-report-template.md has the new Execution Report section replacing the old Verification section

AvdLee and others added 2 commits March 24, 2026 11:50
A community member ran the optimization skills on an older ObjC + CocoaPods
project. The agent improvised CocoaPods-specific changes not covered by any
reference material, overestimated how actionable some findings were, and
produced a 14% regression. This commit addresses the root causes:

- Add actionability classification (repo-local / package-manager /
  xcode-behavior / upstream) to the recommendation format so every finding
  states how fixable it is from the repo
- Add CocoaPods deprecation notice: do not attempt CocoaPods-specific
  optimizations; recommend migrating to SPM instead
- Add regression-aware revert guidance: best-practice settings are never
  revert candidates; speculative changes are only reverted when all build
  types regress; compilation caching trade-offs are evaluated holistically
- Add structured execution report to the fixer and orchestrator template
  so each change is tracked with its measured result and kept/reverted status
- Add benchmark confidence checks: flag high-variance runs and recommend
  additional repetitions before drawing conclusions
- Clarify ExtractAppIntentsMetadata is Xcode-driven and not suppressible
  from repo-local settings
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Xcode build optimization Agent Skills documentation to incorporate lessons from a CocoaPods-based macOS case study, aiming to improve recommendation actionability, avoid unreliable CocoaPods “optimizations,” and strengthen regression/noise handling and reporting structure.

Changes:

  • Add an actionability classification to the shared recommendation format and clarify how to classify Xcode-driven costs (e.g., ExtractAppIntentsMetadata).
  • Add CocoaPods detection guidance that recommends SPM migration and explicitly avoids CocoaPods-specific tuning.
  • Introduce an “Execution Report” structure and add benchmark variance (“confidence”) guidance for interpreting noisy results and regressions.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
skills/xcode-project-analyzer/references/project-audit-checks.md Clarifies ExtractAppIntentsMetadata as xcode-behavior and adds CocoaPods guidance (avoid CocoaPods tuning; prefer SPM migration).
skills/xcode-project-analyzer/SKILL.md Adds CocoaPods usage to the “What To Review” checklist.
skills/xcode-build-orchestrator/references/orchestration-report-template.md Replaces “Verification” with an “Execution Report” section and a per-change tracking table.
skills/xcode-build-orchestrator/SKILL.md Adds a benchmark variance check and strengthens noise/confidence guidance in the final report section.
skills/xcode-build-fixer/SKILL.md Adds regression evaluation rules (best-practice settings not revert candidates) and an execution report structure.
references/recommendation-format.md Makes actionability required and defines allowed values and rendering order.
README.md Updates the documented repository/skill tree structure block.
OPTIMIZATION-CHECKS.md Adds CocoaPods guidance and clarifies ExtractAppIntentsMetadata actionability as xcode-behavior.

README.md Outdated
Comment on lines +219 to +224
```text
skills/
xcode-build-benchmark/
SKILL.md
references/
benchmarking-workflow.md
xcode-compilation-analyzer/
SKILL.md
references/
code-compilation-checks.md
xcode-project-analyzer/
SKILL.md
references/
project-audit-checks.md
spm-build-analysis/
SKILL.md
references/
spm-analysis-checks.md
xcode-build-orchestrator/
SKILL.md
references/
orchestration-report-template.md
xcode-build-fixer/
SKILL.md
references/
fix-patterns.md
xcode-build-optimization-agent-skill/
.claude-plugin/
marketplace.json
plugin.json
references/
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill tree example hard-codes the repo root as xcode-build-optimization-agent-skill/, but the checkout directory name is user-controlled (and the README itself is already at repo root). To keep the structure example universally correct, consider using . / <repo-root>/ instead of a specific folder name.

Copilot uses AI. Check for mistakes.
Comment on lines 7 to 17
Each recommendation should include:

- `title`
- `wait_time_impact` -- plain-language statement of expected wall-clock impact, e.g. "Expected to reduce your clean build by ~3s", "Reduces parallel compile work but unlikely to reduce build wait time", or "Impact on wait time is uncertain -- re-benchmark to confirm"
- `actionability` -- classifies how fixable the issue is from the project (see values below)
- `category`
- `observed_evidence`
- `estimated_impact`
- `confidence`
- `approval_required`
- `benchmark_verification_status`
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With actionability now required (and wait_time_impact already required), the repo’s Markdown renderers/generators should be updated to emit and render these fields in the specified order. For example, scripts/render_recommendations.py currently omits both wait_time_impact and actionability, and scripts/generate_optimization_report.py doesn’t handle actionability either, so the produced reports won’t match this documented contract.

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +99
## Execution Report (post-approval)

### Baseline
- Clean build median: X.Xs
- Cached clean build median: X.Xs (if applicable)
- Incremental build median: X.Xs

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template states generate_optimization_report.py produces this format automatically, but the script currently doesn’t emit any post-approval “Execution Report” placeholder/section. If this section is now expected in the generated plan, the generator should be updated to include it (even as an empty placeholder) so Phase 2 can append results without manual restructuring.

Copilot uses AI. Check for mistakes.
1. Collect the build target context: workspace or project, scheme, configuration, destination, and current pain point. When both `.xcworkspace` and `.xcodeproj` exist, prefer `.xcodeproj` unless the workspace contains sub-projects required for the build. Workspaces that reference external projects may fail if those projects are not checked out.
2. Run `xcode-build-benchmark` to establish a baseline if no fresh benchmark exists. The benchmark script auto-detects `COMPILATION_CACHING = YES` and includes cached clean builds that measure the realistic developer experience (warm cache). If the build fails to compile, check `git log` for a recent buildable commit. When working in a worktree, cherry-picking a targeted build fix from a feature branch is acceptable to reach a buildable state. If SPM packages reference gitignored directories in their `exclude:` paths (e.g., `__Snapshots__`), create those directories before building -- worktrees do not contain gitignored content and `xcodebuild -resolvePackageDependencies` will crash otherwise.
3. Verify the benchmark artifact has non-empty `timing_summary_categories`. If empty, the timing summary parser may have failed -- re-parse the raw logs or inspect them manually. If `COMPILATION_CACHING` is enabled, also verify the artifact includes `cached_clean` runs.
3b. **Benchmark confidence check**: For each build type (clean, cached clean, incremental), compare the min and max values. If the spread (max - min) exceeds 20% of the median, flag the benchmark as having high variance and recommend running additional repetitions (5+ runs) before drawing conclusions. High variance makes it difficult to distinguish real improvements from noise. After applying changes, only claim an improvement if the post-change median falls outside the baseline's min-max range.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3b. is not valid Markdown ordered-list syntax, so it likely won’t render as part of the numbered workflow (and may break copy/paste of steps). Consider keeping step 3 as an ordered-list item and making this a nested sub-bullet (e.g., 3.a / 3.b as plain text within a nested list) or renumbering as step 4 with the subsequent steps shifted.

Suggested change
3b. **Benchmark confidence check**: For each build type (clean, cached clean, incremental), compare the min and max values. If the spread (max - min) exceeds 20% of the median, flag the benchmark as having high variance and recommend running additional repetitions (5+ runs) before drawing conclusions. High variance makes it difficult to distinguish real improvements from noise. After applying changes, only claim an improvement if the post-change median falls outside the baseline's min-max range.
- **Benchmark confidence check**: For each build type (clean, cached clean, incremental), compare the min and max values. If the spread (max - min) exceeds 20% of the median, flag the benchmark as having high variance and recommend running additional repetitions (5+ runs) before drawing conclusions. High variance makes it difficult to distinguish real improvements from noise. After applying changes, only claim an improvement if the post-change median falls outside the baseline's min-max range.

Copilot uses AI. Check for mistakes.
@@ -26,6 +26,7 @@ Use this skill for project- and target-level build inefficiencies that are unlik
- asset catalog compilation time, especially in targets with large or numerous catalogs
- `ExtractAppIntentsMetadata` time in the Build Timing Summary -- if the project does not use App Intents but this phase consumes significant time, flag it as unnecessary overhead
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guidance for ExtractAppIntentsMetadata here says to “flag it as unnecessary overhead”, but the updated audit checklist now clarifies it’s typically not cleanly suppressible and should be classified as xcode-behavior (report cost, don’t promise a repo-local fix). This section should be aligned so the skill doesn’t imply there’s an actionable repo-local optimization.

Suggested change
- `ExtractAppIntentsMetadata` time in the Build Timing Summary -- if the project does not use App Intents but this phase consumes significant time, flag it as unnecessary overhead
- `ExtractAppIntentsMetadata` time in the Build Timing Summary -- if this phase consumes significant time, record it as fixed-cost `xcode-behavior` (report the cost and impact, but do not suggest a repo-local optimization unless there is explicit Apple guidance)

Copilot uses AI. Check for mistakes.
Comment on lines 64 to +73
When rendering for human review, preserve the same field order:

1. title
2. wait-time impact
3. observed evidence
4. estimated impact
5. confidence
6. approval required
7. benchmark verification status
3. actionability
4. observed evidence
5. estimated impact
6. confidence
7. approval required
8. benchmark verification status
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the “Markdown Rendering Guidance” ordered list, the field is named wait_time_impact earlier, but the list item says “wait-time impact”. For consistency and to avoid implementers mistaking this for a different field name, consider rendering it as wait_time_impact (and likewise use backticks for the other field names in this list).

Copilot uses AI. Check for mistakes.
AvdLee and others added 4 commits March 24, 2026 12:00
- Fix 3b. to nested sub-bullet for valid Markdown list syntax
- Align ExtractAppIntentsMetadata wording in xcode-project-analyzer to
  match the xcode-behavior classification from the audit checklist
- Add actionability field handling to generate_optimization_report.py
  in both recommendation rendering and approval checklist
The sync-readme CI job expanded the tree to include the entire repo.
Revert to the clean skills-only view matching main.
The script was hard-coding the entire repo structure (references, schemas,
scripts, .claude-plugin) into the README skill tree. Simplified to only
output the skills/ directory, matching the intended structure.
@AvdLee AvdLee merged commit ec90d3d into main Mar 25, 2026
1 check passed
@AvdLee AvdLee deleted the learnings-daily-macos-cocoapods branch March 25, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants