-
Notifications
You must be signed in to change notification settings - Fork 974
Description
Describe the bug
When reading a large Iceberg table from S3 using S3FileIO with S3 Access Grants enabled, Spark jobs intermittently fail with a NullPointerException inside the AWS SDK v2 AttributeMap$Builder.resolveValue, called from S3AccessGrantsIdentityProvider.resolveIdentity.
This only appears under high concurrency / large datasets (e.g., spark.read.table(...).count() over many files). Smaller tables or lower parallelism may run successfully, but increasing parallelism makes the failure reproducible.
The error message from the AWS SDK is:
Encountered a null value when resolving configuration attributes. This is commonly caused by concurrent modifications to non-thread-safe types. Ensure you're synchronizing access to all non-thread-safe types.
From the Iceberg side we are using S3FileIO with S3 Access Grants configured according to the docs, and the S3 client is built via S3Client.builder() with S3FileIOProperties.applyS3AccessGrantsConfigurations(...) (or equivalent).
We have already tried these below combos where still the NPE issue persist
Iceberg versions
1.7.2 and upgraded to 1.10.0 → NPE persists in both.
AWS SDK v2 versions
Tried 2.24.6, 2.30.31, 2.32.1→ NPE persists across all.
S3 Access Grants plugin versions
Tried 2.0.2 and 2.3.0 → NPE persists across both.
Spark / JDK combinations
Spark 3.5.6 with JDK17 and Spark 4.0.1 (JDK21 inside image) → same NPE in both.
Parallelism tuning - Reduced spark.sql.shuffle.partitions / spark.default.parallelism → can change frequency but does not reliably remove the NPE on large tables.
Could you please help me to understand the issue:
- Known issue?
Are you aware of any known concurrency problems between Iceberg’s S3FileIO S3 Access Grants integration and AWS SDK v2 / aws-s3-accessgrants-java-plugin that could cause AttributeMap$Builder.resolveValue to throw an NPE under high Spark parallelism? - Recommended version matrix?
Is there a recommended or validated combination of:
Iceberg version
AWS SDK v2 version
aws-s3-accessgrants-java-plugin version
for running S3 Access Grants with S3FileIO in a high‑concurrency Spark environment? - Client factory / configuration guidance?
From Iceberg’s side, is there any specific guidance on how the S3 client factory should be implemented (or additional S3FileIO / S3AG configuration) to avoid shared, non‑thread‑safe state that might trigger this NPE?
I already posted this in Iceberg , but they redirected me here, below is the ticket for your additional information..
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
should be able to read the large dataset(~ million records) without this NPE error
Current Behavior
Using PySpark script we are trying to read a materialised dataset from our platform and it errors out with this above mentioned NPE. But this issue occurs only in JDK17 and 21 and not in JDK11.
Reproduction Steps
We reproduced the issue our setup but this could be reproduced outside Spark/Iceberg with some minimal Java program:
- Builds an S3Client with aws-s3-accessgrants-java-plugin enabled .
- Starts a fixed thread pool (e.g. 64 threads).
- Each thread repeatedly calls getObject on many S3 objects under a common prefix.
- Under this concurrent load, we intermittently hit: java.lang.NullPointerException in software.amazon.awssdk.utils.AttributeMap$Builder.resolveValue, invoked from software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsIdentityProvider.resolveIdentity.
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.24.6
JDK version used
jdk17
Operating System and version
Spark 3.5.6