-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Describe the bug
The use_object_storage_list_objects_cache is set on by default in recent query builds. This is good when scanning static data but leads to very confusing results when querying object storage. It needs to be turned off by default, especially because the setting does not exist in upstream ClickHouse.
The effects on queries like the one shown below should also be documented.
To Reproduce
- Use a query like the following to enumerate files in object storage:
SELECT _path FROM s3('s3://some-bucket/default/**/*.parquet', One) - Note the results.
- Add a file to the above path.
- Run the query again to enumerate paths. You won't see the new file until the cache expires.
Expected behavior
I would expect the file to appear immediately. The cache setting is valuable in some cases (extremely so when enumerating large list of files repeatedly) but the user should set it deliberately.
Key information
Provide relevant runtime details.
- Project Antalya Build Version: 25.8.16.20001.altinityantalya
- Cloud provider: AWS / Altinity.Cloud BYOC
- Kubernetes provider: EKS
- Object storage: AWS S3
- Iceberg catalog: Ice 0.12.0 (I think)
Additional context
I thought I was hitting a query bug when I first encountered this behavior. I ran the sample query using different wildcard patterns, which gave different results because they resulted in different cache contents for each pattern. This further deepened my confusion. ;)