[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #27581
+1,820
−42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Follow up PR of #25838
Currently, when a Flink job finishes, it writes an archive as a single file that maps paths to JSON files. Flink History Server (FHS) job archives are pulled locally to where the FHS is running. This process creates a local directory structure that scales inefficiently as the number of jobs increases.
Key Problems
Proposed Solution
Integrating RocksDB, a high-performance embedded database, as an alternative storage backend for job archives. RocksDB provides:
The integration of RocksDB is implemented as a pluggable backend. The current file system storage remains intact, while RocksDB serves as an optional alternative for efficient storage and retrieval of job archives.
Brief Change Log
1. KVStore Interface
KVStoreas an abstraction for key-value storage systems to enable flexible storage backends.2. RocksDB Integration
HistoryServerRocksDBKVStoreas the RocksDB-based implementation of theKVStoreinterface.3. ArchiveFetcher Abstraction and Improvements
ArchiveFetcheras an abstract class to support multiple backends for job archive fetching.HistoryServerArchiveFetcherfor file-based systems.HistoryServerKVStoreArchiveFetcherto fetch job archives using RocksDB.4. ServerHandler Abstraction and Improvements
HistoryServerServerHandleras an abstract base class for handling HTTP requests, supporting pluggable backends.HistoryServerStaticFileServerHandlerfor file-based job archive serving.HistoryServerKVStoreServerHandlerto serve job data from RocksDB via REST APIs.5. HistoryServer Updates
HistoryServerto integrate theKVStoreinterface and support RocksDB as a pluggable backend.HistoryServerOptionsto toggle between file-based and RocksDB storagen:Verifying this change
This change added tests and can be verified as follows:
1. Testing
Unit Tests:
FhsRocksDBKVStoreTestto validate CRUD operations and resource cleanup for RocksDB.HistoryServerKVStoreArchiveFetcherTestto ensure correct fetching and processing of job archives from RocksDB.Integration Tests:
flink-conf.yamlto test both file-based and RocksDB backends.End-to-End Tests:
2. Performance Enhancements
These enhancements significantly improve scalability, reduce resource overhead, and make the History Server more responsive for large-scale deployments.
Does this pull request potentially affect one of the following parts:
Documentation