What happens?
When increasing thread count -- my breaking example is 64 on a 10 core machine -- seeing the following error for an S3 reading of parquet files:
IOException: IO Error: Could not resolve hostname error for HTTP GET
This is for duckdb version 1.5.1.
Threads counts of ~8 works always, thread count of 32 fails eventually (but takes longer).
NOTE: this formerly worked just fine with version <=1.4. We had tested up 128-256 threads even, where 64 was a nice performance sweet spot for this particular operation.
To Reproduce
To reproduce, I'm setting AWS env vars in my environment (but also saw this with SSO credential chain).
This works ✅ :
import duckdb
conn = duckdb.connect()
conn.execute("install httpfs; load httpfs;")
print(conn.query("""
select count(*) from read_parquet(
's3://<bucket>/<path>/**/*.parquet',
hive_partitioning=true,
filename=true
)
limit 3;
"""))
This throws an error 🚫 :
import duckdb
conn = duckdb.connect()
conn.execute("install httpfs; load httpfs;")
conn.execute("SET threads = 64;") #<--------------------
print(conn.query("""
select count(*) from read_parquet(
's3://<bucket>/<path>/**/*.parquet',
hive_partitioning=true,
filename=true
)
limit 3;
"""))
Error:
---------------------------------------------------------------------------
IOException Traceback (most recent call last)
Cell In[3], line 8
4 conn.execute("install httpfs; load httpfs;")
6 conn.execute("SET threads = 64;")
----> 8 print(conn.query("""
9 select count(*) from read_parquet(
10 's3://<bucket>/<path>/**/*.parquet',
11 hive_partitioning=true,
12 filename=true
13 )
14 limit 3;
15 """))
IOException: IO Error: Could not resolve hostname error for HTTP GET to 'https://<bucket>.s3.us-east-1.amazonaws.com/<path>/year%3D2025/month%3D02/day%3D03/3fbf80ad-afad-40a4-b3bb-0cbe7fba6076-0.parquet'
OS:
OSX
DuckDB Package Version:
1.5.1
Python Version:
3.12.6
Full Name:
Graham Hukill
Affiliation:
MIT Libraries
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot easily share my data sets due to their large size
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?
What happens?
When increasing thread count -- my breaking example is 64 on a 10 core machine -- seeing the following error for an S3 reading of parquet files:
This is for duckdb version
1.5.1.Threads counts of ~8 works always, thread count of 32 fails eventually (but takes longer).
NOTE: this formerly worked just fine with version
<=1.4. We had tested up 128-256 threads even, where 64 was a nice performance sweet spot for this particular operation.To Reproduce
To reproduce, I'm setting AWS env vars in my environment (but also saw this with SSO credential chain).
This works ✅ :
This throws an error 🚫 :
Error:
OS:
OSX
DuckDB Package Version:
1.5.1
Python Version:
3.12.6
Full Name:
Graham Hukill
Affiliation:
MIT Libraries
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot easily share my data sets due to their large size
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?