close #923 support pkgutil-style namespace packages#2778
close #923 support pkgutil-style namespace packages#2778mdbernard wants to merge 1 commit intofacebook:mainfrom
Conversation
|
Hi @mdbernard! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
ea7c502 to
f2d98bc
Compare
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
f2d98bc to
e8326ed
Compare
e8326ed to
f3b37ca
Compare
f3b37ca to
16a628e
Compare
connernilsen
left a comment
There was a problem hiding this comment.
Hey @mdbernard, thanks for doing this. I think we may still be discussing if/how we want to implement this, but I have a few questions I'm curious about. I haven't looked into this in a while, so my memory on how parts of this works are a little hazy. Let me know if any of my questions are worded poorly or don't represent what you're trying to achieve.
| /// root of the first field. To support legacy pkgutil.extend_path-style | ||
| /// namespace packages, the second field also contains the remaining roots | ||
| /// to search for matches if the first does not contain a match. | ||
| RegularPackage(PathBuf, Vec1<PathBuf>), |
There was a problem hiding this comment.
If we're adding support for namespace package extend path functionality, why is that happening in RegularPackage? It seems like that would be more appropriately added to NamespacePackage.
The reason I ask is that by design, the first result for a RegularPackage needs to take priority over all other results. Anything we can find afterward breaks the design of how a RegularPackage works in Python, and can cause Pyrefly to fail to error in some cases.
We can make exceptions to this rule, but I think I need to better understand this approach before we can decide on that. It might be that another approach we might want to take would be to make a new LegacyNamespacePackage(PathBuf, Vec1<PathBuf>), which has some of the information of a RegularPackage, but is treated as a NamespacePackage instead.
There was a problem hiding this comment.
Thanks for the review!
My understanding here is that NamespacePackage is a shorthand for "implicit namespace package" defined in PEP 420. In the case of pkgutil-style packages, the namespaces aren't implicit, since they do use __init__.py files. Additionally, once pkgutil-style packages are installed, they look identical to regular packages in the site-packages folder.
For example, this pkgutil structure:
foo/
foobar/
__init__.py # contains pkgutil.extend_path
foo.py
bar/
foobar/
__init__.py # contains pkgutil.extend_path
bar.py
Ends up looking like this in site-packages when it's installed:
foobar/
__init__.py
foo.py
bar.py
For both of these reasons, I thought it made more sense to treat these as RegularPackages instead of NamespacePackages, but I'm open to your suggestion about LegacyNamespacePackage being a distinct enum variant. Two questions though:
[...] the first result for a
RegularPackageneeds to take priority over all other results. Anything we can find afterward breaks the design of how aRegularPackageworks in Python, and can cause Pyrefly to fail to error in some cases.
Do you happen to have an example of how this implementation fails? It's a bit unclear to me what scenario you're concerned about. I'd be happy to add more unit tests. In this implementation, we search for matches in all the roots, and then return the match with the highest priority (e.g. .pyi > .py > .pyc) (see the new test_namespace_package_submodule_prefers_pyi_over_py unit test for an example). This can be pretty easily changed to just return the match from the first root, regardless of that priority order though.
[...] we might want to take would be to make a new
LegacyNamespacePackage(PathBuf, Vec1<PathBuf>), which has some of the information of aRegularPackage, but is treated as aNamespacePackageinstead.
Currently, pkgutil-style packages are treated as RegularPackages by Pyrefly, since they look the same based on what the finder is looking for (dir name and __init__.py file presence). Do you have an idea of how you'd prefer this be done differently?
| Some(FindResult::RegularPackage(_, existing_roots)) => { | ||
| // These __init__ files are lower-priority, so just accumulate the | ||
| // new roots after the existing roots in case we need to look there. | ||
| existing_roots.push(init_roots.into_vec().remove(0)); |
There was a problem hiding this comment.
Why do we need to remove the first entry in init_roots?
There was a problem hiding this comment.
The first one is the folder containing the init_path, so we don't want to search that one again.
|
Thanks for the review @connernilsen, I've replied to your feedback. I'd also be happy to jump on Discord to chat through it if that's easier! |
Summary
This change adds support for legacy pkgutil-style namespace packages, including those using pip's
compateditable install mode. This is already supported by popular language servers like Jedi and Pyright, and lack of support in Pyrefly makes developers of those repos reluctant to switch despite Pyrefly's much faster speed and lower resource utilization.Notably, Mypy does not support pkgutil-style packages (see python/mypy#9393), requiring developers of these types of repos to configure massive
MYPYPATHenvironment variables to include all the dependency folders, so this change makes Pyrefly not only a great language server for these typs of repos, but also makes it work out-of-the-box as a type-checker.Two similar, but distinct, cases are resolved in this change. Each is demonstrated below.
Case 1: RegularPackage defined across multiple roots
azure-core, and also editably installazure-core:In your virtual environment's site-packages folder, you should see the following:
sdk/keyvault/azure-keyvault-secrets/azure/keyvault/secrets/_client.py.In this file, imports from
azure.coreareUnknown, despiteazure.corebeing installed:The problem here is that when searching for the
azurepackage across the search roots, thefind_one_partfunction short-circuits after finding aRegularPackagenamedazure. It only returns the first root found--in this case the one containingkeyvault, discardingazure.coresince it is defined in a different root :pyrefly/pyrefly/lib/module/finder.rs
Line 215 in 40dc64b
The solution here is to update the
FindResult::RegularPackagevariant to be able to contain multiple root directory paths. We updatefind_one_partto no longer short-circuit, instead accumulating roots forRegularPackages, similar to what we do forNamespacePackages. Finally we update the callers offind_one_partto search over all roots until a best match is found.Once compiled, we can verify in our editor that our LSP can find the file:
Case 2: RegularPackage and NamespacePackage share a namespace
The setup for this case is similar to Case 1, with the exception that we do not editably install
azure-core.azure-core:In your virtual environment, you should see a structure like the following:
__editable__.azure_keyvault_secrets-4.10.1.pth azure/ core/ ... # installed files from this package keyvault/ __init__.py # nothing else, editably installedsdk/keyvault/azure-keyvault-secrets/azure/keyvault/secrets/_client.py.Now we have the same result as Case 1: imports from
azure.coreareUnknown, but this time for a different reason. In this situation,azure.coreis considered to be aNamespacePackagebyfind_one_part, not aRegularPackageas it was in Case 1. As a result, even with our change in Case 1,azure.coreisn't accumulated, so it is never searched. The solution here is to append anyNamespacePackageaccumulations to the end of theRegularPackageaccumulations before returning fromfind_one_part.Once compiled, we see that the imports can now be found:
Test Plan