Mock diff: ParamSpec forwarding fix (PR #2801) for primer classifier testing#2842
Mock diff: ParamSpec forwarding fix (PR #2801) for primer classifier testing#2842migeed-z wants to merge 3 commits intofacebook:mainfrom
Conversation
Summary: I noticed when looking at the classifier output for facebook#2764 that the "verdict" formatting needed to be fixed. Two fixes: 1. formatter.py: Add _format_reason() to render JSON reason dicts as labeled readable sections (e.g. "**Spec check:** ...", "**Reasoning:** ...") 2. llm_client.py: Ensure reason is always a string by serializing dict values, so downstream code handles it consistently. Reviewed By: grievejia Differential Revision: D97422229
…nd cross-project consistency Summary: The primer classifier has been producing inconsistent results across runs — the same primer diff can be classified as 'improvement' in one run and 'regression' in another. This was observed on real PRs like facebook#2839 (altair TypeVar iterability) and facebook#2764 (overload resolution, 60+ projects). Three changes to improve reliability: 1. **Self-critique pass (Pass 1.5)**: After Pass 1 produces reasoning, a new pass checks it for factual errors — e.g., claiming dicts are not iterable, incorrect inheritance claims, wrong TypeVar constraint analysis. This catches hallucinations before they reach the verdict pass. Tested on PR facebook#2839 where it correctly identified that both constraints of `_C` (list and TypedDict) are iterable. 2. **Majority voting on verdict (Pass 2)**: Instead of a single verdict call, makes 5 independent calls and takes the majority. This reduces non-determinism where the same reasoning could be classified either way. Vote distribution is logged for transparency. 3. **Cross-project consistency enforcement**: After classifying all projects independently, groups them by error kind and enforces majority verdict within each group. This prevents the classifier from saying 'overload resolution improved' for one project and 'overload resolution regressed' for another with the same pattern. Also upgrades the default Anthropic model from claude-opus-4-20250514 to claude-opus-4-6 for better Pass 1 reasoning quality. Differential Revision: D97571454
…ssifier testing Summary: Mock commit to test the improved primer classifier workflow on GitHub. This copies the code changes from PR facebook#2801 to trigger the primer and verify classifier output formatting in the real GitHub Actions environment. Differential Revision: D97576884
|
Diff from mypy_primer, showing the effect of this PR on open source code: async-utils (https://github.com/mikeshardmind/async-utils)
- ERROR src/async_utils/bg_tasks.py:192:48-80: Expected `P` to be a ParamSpec value in function `_sem_fut` [bad-argument-type]
- ERROR src/async_utils/gen_transform.py:206:36-56: Expected `P` to be a ParamSpec value in function `_sync_to_async_gen` [bad-argument-type]
- ERROR src/async_utils/gen_transform.py:243:35-55: Expected `P` to be a ParamSpec value in function `_sync_to_async_gen` [bad-argument-type]
pwndbg (https://github.com/pwndbg/pwndbg)
- ERROR pwndbg/commands/__init__.py:948:41-61: Expected `P` to be a ParamSpec value in function `_try2run_heap_command` [bad-argument-type]
- ERROR pwndbg/commands/__init__.py:963:45-65: Expected `P` to be a ParamSpec value in function `_try2run_heap_command` [bad-argument-type]
starlette (https://github.com/encode/starlette)
- ERROR starlette/applications.py:101:50-85: Expected `P` to be a ParamSpec value in function `starlette.middleware.Middleware.__init__` [bad-argument-type]
- ERROR starlette/background.py:31:30-53: Expected `P` to be a ParamSpec value in function `BackgroundTask.__init__` [bad-argument-type]
zulip (https://github.com/zulip/zulip)
- ERROR zerver/lib/profile.py:34:30-53: Expected `ParamT` to be a ParamSpec value in function `cProfile.Profile.runcall` [bad-argument-type]
pandas (https://github.com/pandas-dev/pandas)
- ERROR pandas/core/generic.py:6134:27-73: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@NDFrame, ((Self@NDFrame, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/groupby/groupby.py:540:24-53: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@BaseGroupBy, ((Self@BaseGroupBy, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/resample.py:342:28-51: No matching overload found for function `pandas.core.groupby.groupby.BaseGroupBy.pipe` called with arguments: (((Self@Resampler, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/window/expanding.py:435:28-51: No matching overload found for function `pandas.core.window.rolling.RollingAndExpandingMixin.pipe` called with arguments: (((Self@Expanding, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/window/rolling.py:1634:24-53: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@RollingAndExpandingMixin, ((Self@RollingAndExpandingMixin, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/window/rolling.py:2376:28-51: No matching overload found for function `RollingAndExpandingMixin.pipe` called with arguments: (((Self@Rolling, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/io/formats/style.py:4200:24-53: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@Styler, ((Self@Styler, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
pytest-robotframework (https://github.com/detachhead/pytest-robotframework)
- ERROR pytest_robotframework/__init__.py:303:30-68: Expected `P` to be a ParamSpec value in function `_KeywordDecorator.inner` [bad-argument-type]
trio (https://github.com/python-trio/trio)
- ERROR src/trio/_socket.py:440:46-76: Expected `P` to be a ParamSpec value in function `_SocketType._nonblocking_helper` [bad-argument-type]
paasta (https://github.com/yelp/paasta)
- ERROR paasta_tools/async_utils.py:184:24-51: Expected `P` to be a ParamSpec value in function `run_sync` [bad-argument-type]
scrapy (https://github.com/scrapy/scrapy)
- ERROR scrapy/utils/asyncio.py:222:34-57: Expected `_P` to be a ParamSpec value in function `AsyncioLoopingCall.__init__` [bad-argument-type]
- ERROR scrapy/utils/defer.py:288:60-290:6: Expected `_P` to be a ParamSpec value in function `_AsyncCooperatorAdapter.__init__` [bad-argument-type]
- ERROR scrapy/utils/defer.py:430:31-54: Expected `_P` to be a ParamSpec value in function `_maybeDeferred_coro` [bad-argument-type]
prefect (https://github.com/PrefectHQ/prefect)
- ERROR src/prefect/_internal/concurrency/api.py:32:23-46: Expected `P` to be a ParamSpec value in function `prefect._internal.concurrency.calls.Call.new` [bad-argument-type]
- ERROR src/prefect/utilities/asyncutils.py:400:44-73: Expected `P` to be a ParamSpec value in function `run_async_from_worker_thread` [bad-argument-type]
- ERROR src/prefect/utilities/asyncutils.py:404:37-66: Expected `P` to be a ParamSpec value in function `run_async_in_new_loop` [bad-argument-type]
|
Primer Diff Classification✅ 10 improvement(s) | 10 project(s) total | -24 errors 10 improvement(s) across async-utils, pwndbg, starlette, zulip, pandas, pytest-robotframework, trio, paasta, scrapy, prefect.
Detailed analysis✅ Improvement (10)async-utils (-3)
pwndbg (-2)
starlette (-2)
zulip (-1)
pandas (-7)
pytest-robotframework (-1)
trio (-1)
paasta (-1)
scrapy (-3)
prefect (-3)
Was this helpful? React with 👍 or 👎 Classification by primer-classifier (10 LLM) |
Summary:
Mock commit to test the improved primer classifier workflow on GitHub.
This copies the code changes from PR #2801 to trigger the primer and verify
classifier output formatting in the real GitHub Actions environment.
Differential Revision: D97576884