[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #27581

Shawnsuun · 2026-02-11T09:22:35Z

What is the purpose of the change

Follow up PR of #25838

Currently, when a Flink job finishes, it writes an archive as a single file that maps paths to JSON files. Flink History Server (FHS) job archives are pulled locally to where the FHS is running. This process creates a local directory structure that scales inefficiently as the number of jobs increases.

Key Problems

High inode usage in the file system due to nested directories for job archives.
Slower data retrieval and bottlenecks in job archive navigation at scale.
Challenges due to limited file system scalability.

Proposed Solution

Integrating RocksDB, a high-performance embedded database, as an alternative storage backend for job archives. RocksDB provides:

Faster job data retrieval.
Reduced inode consumption.
Enhanced scalability, especially in containerized environments.

The integration of RocksDB is implemented as a pluggable backend. The current file system storage remains intact, while RocksDB serves as an optional alternative for efficient storage and retrieval of job archives.

Brief Change Log

1. KVStore Interface

Introduced KVStore as an abstraction for key-value storage systems to enable flexible storage backends.
Added basic CRUD operations and advanced capabilities for managing job archives.

2. RocksDB Integration

Implemented HistoryServerRocksDBKVStore as the RocksDB-based implementation of the KVStore interface.
Mapped the hierarchical file-based job archive structure into key-value pairs for efficient storage and retrieval.

3. ArchiveFetcher Abstraction and Improvements

Introduced ArchiveFetcher as an abstract class to support multiple backends for job archive fetching.
Updated HistoryServerArchiveFetcher for file-based systems.
Created HistoryServerKVStoreArchiveFetcher to fetch job archives using RocksDB.

4. ServerHandler Abstraction and Improvements

Designed HistoryServerServerHandler as an abstract base class for handling HTTP requests, supporting pluggable backends.
Updated HistoryServerStaticFileServerHandler for file-based job archive serving.
Implemented HistoryServerKVStoreServerHandler to serve job data from RocksDB via REST APIs.

5. HistoryServer Updates

Modified HistoryServer to integrate the KVStore interface and support RocksDB as a pluggable backend.
Added configuration options in HistoryServerOptions to toggle between file-based and RocksDB storagen:
Add the following configuration options in your flink-conf.yaml file to enable RocksDB as the storage backend for the History Server.
```
historyserver.storage.backend: kvstore
```

Verifying this change

This change added tests and can be verified as follows:

1. Testing

Unit Tests:
- Added FhsRocksDBKVStoreTest to validate CRUD operations and resource cleanup for RocksDB.
- Added HistoryServerKVStoreArchiveFetcherTest to ensure correct fetching and processing of job archives from RocksDB.
Integration Tests:
- Built a Flink binary and configured flink-conf.yaml to test both file-based and RocksDB backends.
- Verified archive retrieval via the History Server web UI and ensured backward compatibility with the file-based backend.
End-to-End Tests:
- Conducted tests in a Kubernetes cluster with both RocksDB and file-based storage backends.
- Verified correct behavior of the History Server in processing and displaying job archives for both storage backends in a real-world setup.

2. Performance Enhancements

Faster Archive Retrieval: Achieved a 4.25x improvement in fetching and processing archives with RocksDB compared to the traditional file system (tested in a production environment).
- File system: 17 minutes for 100 archives.
- RocksDB: 4 minutes for 100 archives.
Reduced Inode Usage: Reduced inode consumption by over 99.99%.
- File system: Over 20 million inodes.
- RocksDB: Only 79 inodes.
Lower Storage Usage: Achieved a 95.6% reduction in storage usage.
- File system: 48 GB for 100 archives.
- RocksDB: 2.1 GB for 100 archives.

These enhancements significantly improve scalability, reduce resource overhead, and make the History Server more responsive for large-scale deployments.

Does this pull request potentially affect one of the following parts:

Dependencies: No (using existing RocksDB dependency).
Public API: No.
Serializers: No.
Performance-sensitive code paths: Yes (job archive storage and retrieval).
Deployment or recovery: Yes (affects FHS deployment with the RocksDB backend option).
File system connectors: No.

Documentation

Does this pull request introduce a new feature? (yes)
If yes, how is the feature documented? (not documented)

…ocksDB # Conflicts: # flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java

# Conflicts: # flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java

…S PR

flinkbot · 2026-02-11T09:33:24Z

CI report:

2f8fbaf UNKNOWN
c7ebd7e Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

… docs

Shawnsuun and others added 12 commits February 10, 2026 22:43

Add KVStore interface, RocksDB impl, and unit tests

a0daeb8

Add RocksDB configuration to HistoryServerOptions

8d530ad

Add ArchiveFetcher abstract class and extension for file system and R…

9bd9dea

…ocksDB # Conflicts: # flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java

add tests for archiveFetcher

99abb63

Add server handlers for KVStore and static file support

5557e40

Update HistoryServer and tests for RocksDB integration

345a0c5

# Conflicts: # flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java

Add missing Apache license header

9c121f4

fixed testUpdateJobOverview fail

6919e41

fixed spotless

0923daf

Add storage backend config to History Server docs

bff8027

[FLINK-36429] [historyserver] Fix stale API/test drift for RocksDB FH…

36f5914

…S PR

Merge branch 'apache:master' into rocksdb-fhs-integration

2f8fbaf

Shawnsuun force-pushed the rocksdb-fhs-integration branch 2 times, most recently from bc281cc to 250adb9 Compare February 11, 2026 10:51

[FLINK-36249] [historyserver] Add ArchiveFetcher javadocs and backend…

c7ebd7e

… docs

Shawnsuun force-pushed the rocksdb-fhs-integration branch from 250adb9 to c7ebd7e Compare February 12, 2026 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #27581

[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #27581

Shawnsuun commented Feb 11, 2026

Uh oh!

flinkbot commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #27581

Are you sure you want to change the base?

[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #27581

Conversation

Shawnsuun commented Feb 11, 2026

What is the purpose of the change

Key Problems

Proposed Solution

Brief Change Log

1. KVStore Interface

2. RocksDB Integration

3. ArchiveFetcher Abstraction and Improvements

4. ServerHandler Abstraction and Improvements

5. HistoryServer Updates

Verifying this change

1. Testing

2. Performance Enhancements

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flinkbot commented Feb 11, 2026 •

edited

Loading