Skip to content

Mock diff: ParamSpec forwarding fix (PR #2801) for primer classifier testing#2842

Closed
migeed-z wants to merge 3 commits intofacebook:mainfrom
migeed-z:export-D97576884
Closed

Mock diff: ParamSpec forwarding fix (PR #2801) for primer classifier testing#2842
migeed-z wants to merge 3 commits intofacebook:mainfrom
migeed-z:export-D97576884

Conversation

@migeed-z
Copy link
Contributor

Summary:
Mock commit to test the improved primer classifier workflow on GitHub.
This copies the code changes from PR #2801 to trigger the primer and verify
classifier output formatting in the real GitHub Actions environment.

Differential Revision: D97576884

Summary:
I noticed when looking at the classifier output for facebook#2764 that the "verdict" formatting needed to be fixed.

Two fixes:
1. formatter.py: Add _format_reason() to render JSON reason dicts as
   labeled readable sections (e.g. "**Spec check:** ...", "**Reasoning:** ...")
2. llm_client.py: Ensure reason is always a string by serializing dict
   values, so downstream code handles it consistently.

Reviewed By: grievejia

Differential Revision: D97422229
…nd cross-project consistency

Summary:
The primer classifier has been producing inconsistent results across runs — the same primer diff can be classified as 'improvement' in one run and 'regression' in another. This was observed on real PRs like facebook#2839 (altair TypeVar iterability) and facebook#2764 (overload resolution, 60+ projects).

Three changes to improve reliability:

1. **Self-critique pass (Pass 1.5)**: After Pass 1 produces reasoning, a new pass checks it for factual errors — e.g., claiming dicts are not iterable, incorrect inheritance claims, wrong TypeVar constraint analysis. This catches hallucinations before they reach the verdict pass. Tested on PR facebook#2839 where it correctly identified that both constraints of `_C` (list and TypedDict) are iterable.

2. **Majority voting on verdict (Pass 2)**: Instead of a single verdict call, makes 5 independent calls and takes the majority. This reduces non-determinism where the same reasoning could be classified either way. Vote distribution is logged for transparency.

3. **Cross-project consistency enforcement**: After classifying all projects independently, groups them by error kind and enforces majority verdict within each group. This prevents the classifier from saying 'overload resolution improved' for one project and 'overload resolution regressed' for another with the same pattern.

Also upgrades the default Anthropic model from claude-opus-4-20250514 to claude-opus-4-6 for better Pass 1 reasoning quality.

Differential Revision: D97571454
…ssifier testing

Summary:
Mock commit to test the improved primer classifier workflow on GitHub.
This copies the code changes from PR facebook#2801 to trigger the primer and verify
classifier output formatting in the real GitHub Actions environment.

Differential Revision: D97576884
@meta-cla meta-cla bot added the cla signed label Mar 21, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 21, 2026

@migeed-z has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97576884.

@github-actions
Copy link

Diff from mypy_primer, showing the effect of this PR on open source code:

async-utils (https://github.com/mikeshardmind/async-utils)
- ERROR src/async_utils/bg_tasks.py:192:48-80: Expected `P` to be a ParamSpec value in function `_sem_fut` [bad-argument-type]
- ERROR src/async_utils/gen_transform.py:206:36-56: Expected `P` to be a ParamSpec value in function `_sync_to_async_gen` [bad-argument-type]
- ERROR src/async_utils/gen_transform.py:243:35-55: Expected `P` to be a ParamSpec value in function `_sync_to_async_gen` [bad-argument-type]

pwndbg (https://github.com/pwndbg/pwndbg)
- ERROR pwndbg/commands/__init__.py:948:41-61: Expected `P` to be a ParamSpec value in function `_try2run_heap_command` [bad-argument-type]
- ERROR pwndbg/commands/__init__.py:963:45-65: Expected `P` to be a ParamSpec value in function `_try2run_heap_command` [bad-argument-type]

starlette (https://github.com/encode/starlette)
- ERROR starlette/applications.py:101:50-85: Expected `P` to be a ParamSpec value in function `starlette.middleware.Middleware.__init__` [bad-argument-type]
- ERROR starlette/background.py:31:30-53: Expected `P` to be a ParamSpec value in function `BackgroundTask.__init__` [bad-argument-type]

zulip (https://github.com/zulip/zulip)
- ERROR zerver/lib/profile.py:34:30-53: Expected `ParamT` to be a ParamSpec value in function `cProfile.Profile.runcall` [bad-argument-type]

pandas (https://github.com/pandas-dev/pandas)
- ERROR pandas/core/generic.py:6134:27-73: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@NDFrame, ((Self@NDFrame, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/groupby/groupby.py:540:24-53: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@BaseGroupBy, ((Self@BaseGroupBy, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/resample.py:342:28-51: No matching overload found for function `pandas.core.groupby.groupby.BaseGroupBy.pipe` called with arguments: (((Self@Resampler, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/window/expanding.py:435:28-51: No matching overload found for function `pandas.core.window.rolling.RollingAndExpandingMixin.pipe` called with arguments: (((Self@Expanding, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/window/rolling.py:1634:24-53: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@RollingAndExpandingMixin, ((Self@RollingAndExpandingMixin, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/core/window/rolling.py:2376:28-51: No matching overload found for function `RollingAndExpandingMixin.pipe` called with arguments: (((Self@Rolling, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]
- ERROR pandas/io/formats/style.py:4200:24-53: No matching overload found for function `pandas.core.common.pipe` called with arguments: (Self@Styler, ((Self@Styler, ParamSpec(P)) -> T) | tuple[(...) -> T, str], *tuple[Any, ...], **dict[str, Any]) [no-matching-overload]

pytest-robotframework (https://github.com/detachhead/pytest-robotframework)
- ERROR pytest_robotframework/__init__.py:303:30-68: Expected `P` to be a ParamSpec value in function `_KeywordDecorator.inner` [bad-argument-type]

trio (https://github.com/python-trio/trio)
- ERROR src/trio/_socket.py:440:46-76: Expected `P` to be a ParamSpec value in function `_SocketType._nonblocking_helper` [bad-argument-type]

paasta (https://github.com/yelp/paasta)
- ERROR paasta_tools/async_utils.py:184:24-51: Expected `P` to be a ParamSpec value in function `run_sync` [bad-argument-type]

scrapy (https://github.com/scrapy/scrapy)
- ERROR scrapy/utils/asyncio.py:222:34-57: Expected `_P` to be a ParamSpec value in function `AsyncioLoopingCall.__init__` [bad-argument-type]
- ERROR scrapy/utils/defer.py:288:60-290:6: Expected `_P` to be a ParamSpec value in function `_AsyncCooperatorAdapter.__init__` [bad-argument-type]
- ERROR scrapy/utils/defer.py:430:31-54: Expected `_P` to be a ParamSpec value in function `_maybeDeferred_coro` [bad-argument-type]

prefect (https://github.com/PrefectHQ/prefect)
- ERROR src/prefect/_internal/concurrency/api.py:32:23-46: Expected `P` to be a ParamSpec value in function `prefect._internal.concurrency.calls.Call.new` [bad-argument-type]
- ERROR src/prefect/utilities/asyncutils.py:400:44-73: Expected `P` to be a ParamSpec value in function `run_async_from_worker_thread` [bad-argument-type]
- ERROR src/prefect/utilities/asyncutils.py:404:37-66: Expected `P` to be a ParamSpec value in function `run_async_in_new_loop` [bad-argument-type]

@github-actions
Copy link

Primer Diff Classification

✅ 10 improvement(s) | 10 project(s) total | -24 errors

10 improvement(s) across async-utils, pwndbg, starlette, zulip, pandas, pytest-robotframework, trio, paasta, scrapy, prefect.

Project Verdict Changes Error Kinds Root Cause
async-utils ✅ Improvement -3 bad-argument-type overload_resolution()
pwndbg ✅ Improvement -2 bad-argument-type The change to callable.rs in the AnswersSolver implem...
starlette ✅ Improvement -2 bad-argument-type is_param_spec()
zulip ✅ Improvement -1 bad-argument-type pyrefly/lib/alt/callable.rs
pandas ✅ Improvement -7 no-matching-overload The change to handle Type::Quantified ParamSpecs in `ca...
pytest-robotframework ✅ Improvement -1 bad-argument-type overload_resolution()
trio ✅ Improvement -1 bad-argument-type pyrefly/lib/alt/callable.rs
paasta ✅ Improvement -1 bad-argument-type pyrefly/lib/alt/callable.rs
scrapy ✅ Improvement -3 bad-argument-type is_param_spec()
prefect ✅ Improvement -3 bad-argument-type is_param_spec()
Detailed analysis

✅ Improvement (10)

async-utils (-3)

These errors were false positives. The code correctly uses ParamSpec forwarding - generic helper functions that accept *args: P.args, **kwargs: P.kwargs and forward them to other generic functions. This is a valid pattern that pyrefly was incorrectly rejecting. The PR fixed this by recognizing when a ParamSpec resolves to another quantified ParamSpec and treating it permissively to allow the forwarding. The test cases added confirm this is intended behavior.
Attribution: The change to overload_resolution() in pyrefly/lib/alt/callable.rs added a new case to handle Type::Quantified(q) if q.[is_param_spec()](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/alt/callable.rs), treating it as ParamList::everything() to allow ParamSpec forwarding.

pwndbg (-2)

These were false positives. The code clearly defines P = ParamSpec('P') on line 40, making it a valid ParamSpec. The errors occurred because pyrefly couldn't properly handle the pattern where _try2run_heap_command (lines 887, 948, 963) uses *a: P.args, **kw: P.kwargs to forward arguments to another function that also uses the same ParamSpec. The PR fixed this by adding support for ParamSpec forwarding between generic helpers, allowing pyrefly to recognize that P remains a valid ParamSpec throughout the forwarding chain. Since mypy and pyright don't flag this pattern, and the code is correct according to the typing spec, removing these errors is an improvement.
Attribution: The change to callable.rs in the AnswersSolver implementation added handling for the case where 'The ParamSpec Var resolved to another quantified ParamSpec'. This fixed pyrefly's ability to handle ParamSpec forwarding patterns where one generic function calls another generic function with the same ParamSpec.

starlette (-2)

Pyrefly was incorrectly flagging valid ParamSpec forwarding patterns. When a generic function with *args: P.args, **kwargs: P.kwargs forwards those arguments to another generic function expecting the same ParamSpec, this is a legitimate pattern used for decorators, middleware, and task wrappers. The PR fixes this by treating ParamSpec-to-ParamSpec forwarding permissively, removing these false positive errors.
Attribution: The change to handle Type::Quantified(q) if q.[is_param_spec()](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/alt/callable.rs) in pyrefly/lib/alt/callable.rs now treats ParamSpec forwarding permissively, allowing these valid patterns

zulip (-1)

This was a false positive. The code shows a standard ParamSpec forwarding pattern where a decorator captures arguments via *args: P.args, **kwargs: P.kwargs and forwards them to prof.runcall(func, *args, **kwargs). This is the canonical way to use ParamSpec for argument forwarding. The PR fixed pyrefly's handling of ParamSpec-to-ParamSpec forwarding by adding a case in callable.rs that treats forwarded ParamSpec arguments permissively, allowing them to pass through. The removal of this error is an improvement - pyrefly now correctly accepts valid ParamSpec forwarding patterns.
Attribution: The change to AnswersSolver in pyrefly/lib/alt/callable.rs added handling for when a ParamSpec resolves to another quantified ParamSpec. The new code treats this case permissively like ..., allowing the forwarded args to pass through. This directly fixes the false positive where pyrefly couldn't handle ParamSpec forwarding between generic helpers.

pandas (-7)

These errors were false positives. The pandas pipe method correctly uses ParamSpec to forward arguments to common.pipe, which is a standard pattern supported by PEP 612. Pyrefly was incorrectly rejecting this valid forwarding pattern. The PR fix properly handles the case where a ParamSpec resolves to another quantified ParamSpec, allowing the arguments to pass through as intended. This is an improvement - pyrefly now correctly accepts valid ParamSpec forwarding that it previously rejected.
Attribution: The change to handle Type::Quantified ParamSpecs in callable.rs directly fixes this by allowing ParamSpec arguments to be forwarded between generic functions

pytest-robotframework (-1)

The removed error was a false positive. The code shows a standard ParamSpec forwarding pattern where _KeywordDecorator.inner accepts *args: P.args, **kwargs: P.kwargs and forwards them to the wrapped function. This is exactly how PEP 612 specifies ParamSpec should work for decorators. The PR fixed pyrefly's handling of ParamSpec resolution when one generic function forwards to another, making it correctly recognize this valid pattern.
Attribution: The change to overload_resolution() in pyrefly/lib/alt/callable.rs added handling for ParamSpec-to-ParamSpec resolution, treating it permissively like ... to allow argument forwarding.

trio (-1)

This is an improvement. The removed error was a false positive where pyrefly incorrectly rejected valid ParamSpec forwarding between generic helper functions. The code at line 440 shows _nonblocking_helper accepting *args: P.args, **kwargs: P.kwargs and forwarding them to another function - a pattern explicitly supported by PEP 612 and the typing spec. The PR fixed pyrefly's ParamSpec resolution to handle cases where one generic function forwards arguments to another, bringing it in line with mypy and pyright behavior.
Attribution: The fix in pyrefly/lib/alt/callable.rs added handling for the case where a ParamSpec resolves to another quantified ParamSpec during generic function composition. The new code treats this case permissively (like ...) to allow argument forwarding, which matches the behavior of mypy and pyright.

paasta (-1)

This is an improvement. Pyrefly was incorrectly flagging valid ParamSpec forwarding between generic functions. The code pattern where to_blocking (generic over ParamSpec P) forwards *args: P.args, **kwargs: P.kwargs to run_sync (also generic over the same P) is correct according to the typing spec and works fine at runtime. The PR fixed pyrefly's overly strict handling of ParamSpec-to-ParamSpec forwarding, removing a false positive. Neither mypy nor pyright flag this pattern as an error, confirming it's valid.
Attribution: The fix in pyrefly/lib/alt/callable.rs added a new case to handle when a ParamSpec resolves to another quantified ParamSpec (line 582). The comment explains: 'one generic helper forwarding *args: P.args, **kwargs: P.kwargs to another'. This directly addresses the pattern in the paasta_tools error where to_blocking forwards its ParamSpec arguments to run_sync.

scrapy (-3)

Pyrefly was incorrectly rejecting valid ParamSpec forwarding patterns where one generic function forwards *args: P.args, **kwargs: P.kwargs to another. The typing spec explicitly supports this pattern for parameter forwarding. The PR fixed this by recognizing when a ParamSpec resolves to another ParamSpec and treating it permissively, removing the false positive errors.
Attribution: The fix in pyrefly/lib/alt/callable.rs added handling for when a ParamSpec resolves to another quantified ParamSpec during generic helper forwarding. The new case Type::Quantified(q) if q.[is_param_spec()](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/alt/callable.rs) => ParamList::everything() treats forwarded ParamSpec parameters permissively, allowing the valid forwarding pattern.

prefect (-3)

These were false positives. The code shows standard ParamSpec forwarding patterns where generic helper functions pass *args: P.args, **kwargs: P.kwargs to other functions with the same ParamSpec P. This is a fundamental use case for ParamSpec - capturing and forwarding arbitrary function signatures. The PR fix correctly recognizes that when a ParamSpec variable resolves to another quantified ParamSpec (during generic instantiation), the type checker should allow the arguments to pass through. The test cases added in the PR (test_paramspec_forwarding_between_generic_helpers and test_paramspec_forwarding_extra_concrete_arg) confirm this is the intended behavior. Removing these errors is an improvement.
Attribution: The fix in pyrefly/lib/alt/callable.rs added a new case to handle Type::Quantified(q) if q.[is_param_spec()](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/alt/callable.rs), treating it like ... (permissively) when a ParamSpec resolves to another quantified ParamSpec. This allows the forwarded args to pass through correctly.


Was this helpful? React with 👍 or 👎

Classification by primer-classifier (10 LLM)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant