You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### 2) Gains persist across size bins, with strongest lift in 1M-5M proxy bucket
111
111
112
112
Curated GT deltas (`MCP - baseline`):
113
-
-`<1M`: F1@10 +0.1047, Total +0.1736
114
-
-`1M-5M`: F1@10 +0.3417, Total +0.4148
115
-
-`5M-20M`: F1@10 +0.0696, Total +0.0960
116
-
-`>20M`: F1@10 +0.1653, Total +0.2104
113
+
-`<1M`: F1@10 +0.1007, Total +0.1318
114
+
-`1M-5M`: F1@10 +0.2680, Total +0.2392
115
+
-`5M-20M`: F1@10 +0.0648, Total +0.0565
116
+
-`>20M`: F1@10 +0.1247, Total +0.1075
117
117
118
118
Interpretation: retrieval lift is not uniform, but MCP shows clear upside where task context is more distributed and retrieval-heavy.
119
119
120
+
Method note: I corrected an Org path-normalization bug in an earlier draft where some baseline paths were mismatched due to path shape differences (for example `repo/repo/path` vs `repo/path`).
121
+
120
122
## Cost and Speed
121
123
122
124
Current paired means:
@@ -170,4 +172,3 @@ Planned next steps:
170
172
3. Compare alternate MCP providers on the same task set.
171
173
4. Run tool-policy experiments (especially semantic/deep-search nudges).
172
174
5. Continue tightening verifier and QA infrastructure before final white paper publication.
Correction note: an earlier draft of this subsection undercounted Org baseline matches due to path-shape normalization differences (for example `repo/repo/path` vs `repo/path`). Numbers below use corrected canonical exact matching.
0 commit comments