Skip to content

[FLINK-39118] Add documentation for Native s3 FileSystem#27841

Open
Samrat002 wants to merge 3 commits intoapache:masterfrom
Samrat002:FLINK-39118-native-s3-documentation
Open

[FLINK-39118] Add documentation for Native s3 FileSystem#27841
Samrat002 wants to merge 3 commits intoapache:masterfrom
Samrat002:FLINK-39118-native-s3-documentation

Conversation

@Samrat002
Copy link
Copy Markdown
Contributor

What is the purpose of the change

Add documentation for Native s3 FileSystem

Please note that this patch does not update the Chinese document yet. This will be done once english document content is reached to consensus.

Brief change log

Add documentation and show how to use new s3Filesystem.

Verifying this change

Build the docs in local using Hugo

 ~/C/O/flink/docs │ on FLINK-39118-…ocumentation !1 ▓▒░ docker run -v $(pwd):/src -p 1313:1313 jakejarvis/hugo-extended:latest server --buildDrafts --buildFuture --bind 0.0.0.0
Watching for changes in /src/{assets,content,content.zh,data,layouts,static,themes}
Watching for config changes in /src/config.toml, /src/themes/connectors/config.yaml
Start building sites … 
hugo v0.124.1-db083b05f16c945fec04f745f0ca8640560cf1ec+extended linux/arm64 BuildDate=2024-03-20T11:40:10Z VendorInfo=docker


                   | EN  | ZH   
-------------------+-----+------
  Pages            | 502 | 500  
  Paginator pages  |   0 |   0  
  Non-page files   |   0 |   0  
  Static files     | 266 | 266  
  Processed images |   0 |   0  
  Aliases          | 419 | 416  
  Cleaned          |   0 |   0  

Built in 7938 ms
Environment: "development"
Serving pages from disk
Running in Fast Render Mode. For full rebuilds on change: hugo server --disableFastRender
Web Server is available at //localhost:1313/flink/flink-docs-master/ (bind address 0.0.0.0) 
Press Ctrl+C to stop
Screenshot 2026-03-27 at 3 24 33 PM Screenshot 2026-03-27 at 3 23 33 PM Screenshot 2026-03-27 at 3 23 42 PM Screenshot 2026-03-27 at 3 24 05 PM Screenshot 2026-03-27 at 3 24 22 PM Screenshot 2026-03-27 at 3 24 33 PM

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no) no

  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) yes

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Mar 27, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Copy Markdown
Contributor

@alpinegizmo alpinegizmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in pretty good shape. Just a couple of points to address.

@Samrat002 Samrat002 force-pushed the FLINK-39118-native-s3-documentation branch from 7b82717 to 7be000d Compare March 31, 2026 17:51
@Samrat002 Samrat002 requested a review from alpinegizmo March 31, 2026 17:55
- Use *s3p://* scheme for checkpointing (Presto)

{{< hint info >}}
The Native S3 implementation does not introduce a new URI scheme. It reuses the existing *s3://* and *s3a://* schemes. To use it alongside the Hadoop implementation, ensure only the Native S3 plugin JAR is in the `plugins` directory (i.e., do not have both `flink-s3-fs-native` and `flink-s3-fs-hadoop` plugins loaded simultaneously for the same scheme).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To use it alongside the Hadoop implementation" -- what does this mean? I had assumed that it's not possible to use both the Native S3 and Hadoop implementations together, since they use the same scheme.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah , "To use it alongside the Hadoop implementation" sounds missleading at the beginning.

I've updated the wording to be more explicit and direct . PTAL

Copy link
Copy Markdown
Contributor

@alpinegizmo alpinegizmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more suggestion, and a question.

@Samrat002 Samrat002 requested a review from alpinegizmo April 1, 2026 05:05
Copy link
Copy Markdown
Contributor

@Izeren Izeren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Samrat002, I have left a few comments, PTAL.

My general request for changes is to replicate this for Chinese docs (usually we update both): https://github.com/apache/flink/blob/master/docs/content.zh/docs/deployment/filesystems/s3.md

It can be done in English for further translation.


Flink provides two file systems to talk to Amazon S3, `flink-s3-fs-presto` and `flink-s3-fs-hadoop`.
Both implementations are self-contained with no dependency footprint, so there is no need to add Hadoop to the classpath to use them.
- **Native S3 FileSystem** (`flink-s3-fs-native`): Built directly on AWS SDK v2 with async I/O and parallel transfers, this implementation supports both checkpointing and the FileSystem sink. [Benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396) show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to the Presto implementation at state sizes up to 15 GB. **Experimental** in Flink 2.3; the API and behavior may change in future releases.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say it is (experimental) in the header of the referenced section?

You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):

```yaml
s3.access-key: your-access-key
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention bucket level configuration overrides for all these?

### Native S3 FileSystem

{{< hint warning >}}
**Experimental**: The Native S3 FileSystem implementation is experimental in Flink 2.3. While functionally complete, it should not yet be used in production environments. Please use Presto or Hadoop implementations for production deployments.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it should not yet be used in production environments"

Maybe too strong statement, it should also explain "why" you should be cautious using it in prod.


- **No external dependencies**: Built on AWS SDK v2 with minimal footprint
- **Drop-in replacement**: Compatible with the same S3 URI schemes (`s3://`)
- **Encryption support**: Server-side encryption (SSE) and KMS encryption
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "and" reads confusing. If we are talking about SSE-KMS it is server side

s3.path-style-access: true
```

## S3 FileSystem Implementations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This heading is repeated twice at the same level, can it break ToC links?


#### Features

- **FileSystem sink support**: The only S3 implementation with support for the [FileSystem sink]({{< ref "docs/connectors/datastream/filesystem" >}})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought FS sink is supported for Native as we have RecoverableWriter implementation, is it a miss or do we need more changes for FS sink support?

s3.retry.max-num-retries: 3

# Credentials provider
fs.s3.aws.credentials.provider: software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is noDefaultValue for this config:

Which may be a bit confusing this way. Should we have it explicitly in the config if this is our intention?


```yaml
s3.path.style.access: true
s3.path-style-access: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we have any backwards compatibilty problem with config property name being changed?




**Caution** : Do not load `flink-s3-fs-native` and `flink-s3-fs-hadoop` plugins simultaneously.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!nit space before the colon

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants