Skip to content

Avoid pinning solver variables too early when RHS is a union (#2839)#2839

Open
migeed-z wants to merge 1 commit intofacebook:mainfrom
migeed-z:export-D97522732
Open

Avoid pinning solver variables too early when RHS is a union (#2839)#2839
migeed-z wants to merge 1 commit intofacebook:mainfrom
migeed-z:export-D97522732

Conversation

@migeed-z
Copy link
Contributor

@migeed-z migeed-z commented Mar 20, 2026

Summary:

During the constraint resolution, when solving a constraint of the form:

Quantified(AnyStr) <: x | None we expand the union and end up pinning str to x which causes a false positive since it pins the type var.

This diff works around the issue by skipping the the subset check which pins the type var. Instread, we directly check Quantified(AnyStr) <: x

RFC: I suspect this is related to typevar pinning and that fixing that is the right solution so I am not sure if we should be adding a workaround.

For issue #2644

Differential Revision: D97522732

@meta-cla meta-cla bot added the cla signed label Mar 20, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 20, 2026

@migeed-z has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97522732.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@meta-codesync meta-codesync bot changed the title Avoid pinning solver variables too early when RHS is a union Avoid pinning solver variables too early when RHS is a union (#2839) Mar 20, 2026
migeed-z added a commit to migeed-z/pyrefly that referenced this pull request Mar 20, 2026
…k#2839)

Summary:

During the constraint resolution, when solving a constraint of the form:

`Quantified(AnyStr) <: x | None` we expand the union and end up pinning `str` to `x` which causes a false positive since it pins the type var.

This diff works around the issue by skipping the the subset check which pins the type var. Instread, we directly check `Quantified(AnyStr)  <: x`

RFC: I suspect this is related to typevar pinning and that fixing that is the right solution so I am not sure if we should be adding a workaround.

For issue facebook#2644

Differential Revision: D97522732
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

migeed-z added a commit to migeed-z/pyrefly that referenced this pull request Mar 20, 2026
…k#2839)

Summary:

During the constraint resolution, when solving a constraint of the form:

`Quantified(AnyStr) <: x | None` we expand the union and end up pinning `str` to `x` which causes a false positive since it pins the type var.

This diff works around the issue by skipping the the subset check which pins the type var. Instread, we directly check `Quantified(AnyStr)  <: x`

RFC: I suspect this is related to typevar pinning and that fixing that is the right solution so I am not sure if we should be adding a workaround.

For issue facebook#2644

Differential Revision: D97522732
migeed-z added a commit to migeed-z/pyrefly that referenced this pull request Mar 20, 2026
…k#2839)

Summary:

During the constraint resolution, when solving a constraint of the form:

`Quantified(AnyStr) <: x | None` we expand the union and end up pinning `str` to `x` which causes a false positive since it pins the type var.

This diff works around the issue by skipping the the subset check which pins the type var. Instread, we directly check `Quantified(AnyStr)  <: x`

RFC: I suspect this is related to typevar pinning and that fixing that is the right solution so I am not sure if we should be adding a workaround.

For issue facebook#2644

Differential Revision: D97522732
migeed-z added a commit to migeed-z/pyrefly that referenced this pull request Mar 20, 2026
…k#2839)

Summary:

During the constraint resolution, when solving a constraint of the form:

`Quantified(AnyStr) <: x | None` we expand the union and end up pinning `str` to `x` which causes a false positive since it pins the type var.

This diff works around the issue by skipping the the subset check which pins the type var. Instread, we directly check `Quantified(AnyStr)  <: x`

RFC: I suspect this is related to typevar pinning and that fixing that is the right solution so I am not sure if we should be adding a workaround.

For issue facebook#2644

Differential Revision: D97522732
…k#2839)

Summary:
Pull Request resolved: facebook#2839

During the constraint resolution, when solving a constraint of the form:

`Quantified(AnyStr) <: x | None` we expand the union and end up pinning `str` to `x` which causes a false positive since it pins the type var.

This diff works around the issue by skipping the the subset check which pins the type var. Instread, we directly check `Quantified(AnyStr)  <: x`

RFC: I suspect this is related to typevar pinning and that fixing that is the right solution so I am not sure if we should be adding a workaround.

For issue facebook#2644

Differential Revision: D97522732
@github-actions
Copy link

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

migeed-z added a commit to migeed-z/pyrefly that referenced this pull request Mar 21, 2026
…nd cross-project consistency

Summary:
The primer classifier has been producing inconsistent results across runs — the same primer diff can be classified as 'improvement' in one run and 'regression' in another. This was observed on real PRs like facebook#2839 (altair TypeVar iterability) and facebook#2764 (overload resolution, 60+ projects).

Three changes to improve reliability:

1. **Self-critique pass (Pass 1.5)**: After Pass 1 produces reasoning, a new pass checks it for factual errors — e.g., claiming dicts are not iterable, incorrect inheritance claims, wrong TypeVar constraint analysis. This catches hallucinations before they reach the verdict pass. Tested on PR facebook#2839 where it correctly identified that both constraints of `_C` (list and TypedDict) are iterable.

2. **Majority voting on verdict (Pass 2)**: Instead of a single verdict call, makes 5 independent calls and takes the majority. This reduces non-determinism where the same reasoning could be classified either way. Vote distribution is logged for transparency.

3. **Cross-project consistency enforcement**: After classifying all projects independently, groups them by error kind and enforces majority verdict within each group. This prevents the classifier from saying 'overload resolution improved' for one project and 'overload resolution regressed' for another with the same pattern.

Also upgrades the default Anthropic model from claude-opus-4-20250514 to claude-opus-4-6 for better Pass 1 reasoning quality.

Differential Revision: D97571454
meta-codesync bot pushed a commit that referenced this pull request Mar 22, 2026
…nd cross-project consistency (#2841)

Summary:
Pull Request resolved: #2841

The primer classifier has been producing inconsistent results across runs — the same primer diff can be classified as 'improvement' in one run and 'regression' in another. This was observed on real PRs like #2839 (altair TypeVar iterability) and #2764 (overload resolution, 60+ projects).

Three changes to improve reliability:

1. **Self-critique pass (Pass 1.5)**: After Pass 1 produces reasoning, a new pass checks it for factual errors — e.g., claiming dicts are not iterable, incorrect inheritance claims, wrong TypeVar constraint analysis. This catches hallucinations before they reach the verdict pass. Tested on PR #2839 where it correctly identified that both constraints of `_C` (list and TypedDict) are iterable.

2. **Majority voting on verdict (Pass 2)**: Instead of a single verdict call, makes 5 independent calls and takes the majority. This reduces non-determinism where the same reasoning could be classified either way. Vote distribution is logged for transparency.

3. **Cross-project consistency enforcement**: After classifying all projects independently, groups them by error kind and enforces majority verdict within each group. This prevents the classifier from saying 'overload resolution improved' for one project and 'overload resolution regressed' for another with the same pattern.

Also upgrades the default Anthropic model from claude-opus-4-20250514 to claude-opus-4-6 for better Pass 1 reasoning quality. According to gemni, this is a big upgrade :) so I am hoping to see improvement in the quality.

Reviewed By: yangdanny97

Differential Revision: D97571454

fbshipit-source-id: 356f4b150e0c4886c2743abc17699e004da997f1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant