Skip to content

NPE in AWS SDK v2 + S3 Access Grants when reading large Iceberg tables with S3FileIO (Spark, JDK17/JDK21) #6664

@manjum-a11y

Description

@manjum-a11y

Describe the bug

When reading a large Iceberg table from S3 using S3FileIO with S3 Access Grants enabled, Spark jobs intermittently fail with a NullPointerException inside the AWS SDK v2 AttributeMap$Builder.resolveValue, called from S3AccessGrantsIdentityProvider.resolveIdentity.

This only appears under high concurrency / large datasets (e.g., spark.read.table(...).count() over many files). Smaller tables or lower parallelism may run successfully, but increasing parallelism makes the failure reproducible.

The error message from the AWS SDK is:

Encountered a null value when resolving configuration attributes. This is commonly caused by concurrent modifications to non-thread-safe types. Ensure you're synchronizing access to all non-thread-safe types.

From the Iceberg side we are using S3FileIO with S3 Access Grants configured according to the docs, and the S3 client is built via S3Client.builder() with S3FileIOProperties.applyS3AccessGrantsConfigurations(...) (or equivalent).

We have already tried these below combos where still the NPE issue persist

Iceberg versions
1.7.2 and upgraded to 1.10.0 → NPE persists in both.
AWS SDK v2 versions
Tried 2.24.6, 2.30.31, 2.32.1→ NPE persists across all.
S3 Access Grants plugin versions
Tried 2.0.2 and 2.3.0 → NPE persists across both.
Spark / JDK combinations
Spark 3.5.6 with JDK17 and Spark 4.0.1 (JDK21 inside image) → same NPE in both.
Parallelism tuning - Reduced spark.sql.shuffle.partitions / spark.default.parallelism → can change frequency but does not reliably remove the NPE on large tables.

Could you please help me to understand the issue:

  1. Known issue?
    Are you aware of any known concurrency problems between Iceberg’s S3FileIO S3 Access Grants integration and AWS SDK v2 / aws-s3-accessgrants-java-plugin that could cause AttributeMap$Builder.resolveValue to throw an NPE under high Spark parallelism?
  2. Recommended version matrix?
    Is there a recommended or validated combination of:
    Iceberg version
    AWS SDK v2 version
    aws-s3-accessgrants-java-plugin version
    for running S3 Access Grants with S3FileIO in a high‑concurrency Spark environment?
  3. Client factory / configuration guidance?
    From Iceberg’s side, is there any specific guidance on how the S3 client factory should be implemented (or additional S3FileIO / S3AG configuration) to avoid shared, non‑thread‑safe state that might trigger this NPE?

I already posted this in Iceberg , but they redirected me here, below is the ticket for your additional information..

apache/iceberg#14942

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

should be able to read the large dataset(~ million records) without this NPE error

Current Behavior

Using PySpark script we are trying to read a materialised dataset from our platform and it errors out with this above mentioned NPE. But this issue occurs only in JDK17 and 21 and not in JDK11.

Reproduction Steps

We reproduced the issue our setup but this could be reproduced outside Spark/Iceberg with some minimal Java program:

  • Builds an S3Client with aws-s3-accessgrants-java-plugin enabled .
  • Starts a fixed thread pool (e.g. 64 threads).
  • Each thread repeatedly calls getObject on many S3 objects under a common prefix.
  • Under this concurrent load, we intermittently hit: java.lang.NullPointerException in software.amazon.awssdk.utils.AttributeMap$Builder.resolveValue, invoked from software.amazon.awssdk.s3accessgrants.plugin.S3AccessGrantsIdentityProvider.resolveIdentity.

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.24.6

JDK version used

jdk17

Operating System and version

Spark 3.5.6

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.closing-soonThis issue will close in 4 days unless further comments are made.service-apiThis issue is due to a problem in a service API, not the SDK implementation.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions