Add option to cache unknown bucket type#844
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #844 +/- ##
==========================================
+ Coverage 88.52% 88.56% +0.03%
==========================================
Files 15 15
Lines 2989 2990 +1
==========================================
+ Hits 2646 2648 +2
+ Misses 343 342 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| - retry_multiplier: Multiplier for delay between retries. | ||
| These map to `google.api_core.retry.AsyncRetry` arguments (without 'retry_' prefix). | ||
| """ | ||
| self._cache_unknown_buckets = kwargs.pop("cache_unknown_buckets", False) |
There was a problem hiding this comment.
If a user sets _cache_unknown_buckets to true and has the required permissions, is the result still cached when a transient error occurs? Could we check if it is a permission error / cache with a TTL, allowing for access to be granted later? WDYT?
6adac07 to
65c0c9b
Compare
|
Not caching UNKNOWN type is intentional, since the storage_lookup api should work on every bucket. In case bucket type is UNKNOWN it surely is a transient error and should not be cached. I think we should revert caching UNKNOWN in this PR and can update the documentation instead so its clearer that its an intentional change |
| By default, files in zonal buckets are left unfinalized to allow appends. | ||
| **kwargs : dict | ||
| - cache_unknown_buckets : bool, default False | ||
| Whether to cache UNKNOWN bucket types. Useful when users lack permissions |
There was a problem hiding this comment.
This use case is very rare. storage layout api needs minimal object permissions
https://docs.cloud.google.com/storage/docs/getting-storage-layout#expandable-1
Adding this flag might not be useful as in most cases UNKNOWN might come due to transient api call failure which should not be cached
This PR adds a
cache_unknown_bucketsconfiguration option to allow caching ofUNKNOWNbucket type .Currently,
ExtendedGcsFileSystemattempts to detect bucket types (zonal, HNS, etc.) using the Storage Control API. If a user lacks permissions for this API (or when using an emulator that doesn't support it), the lookup fails and the bucket type falls back toUNKNOWN.Since
UNKNOWNtypes are not cached by default, every subsequent operation on the bucket triggers another failing API call, causing significant performance degradation due to repeated slow lookups.