Skip to content

[SPARK-56734][CORE] Optimize RocksDBPersistenceEngine with Column Families and zero-allocation prefix matching#55696

Open
darion-yaphet wants to merge 1 commit intoapache:masterfrom
darion-yaphet:SPARK-56734
Open

[SPARK-56734][CORE] Optimize RocksDBPersistenceEngine with Column Families and zero-allocation prefix matching#55696
darion-yaphet wants to merge 1 commit intoapache:masterfrom
darion-yaphet:SPARK-56734

Conversation

@darion-yaphet
Copy link
Copy Markdown

What changes were proposed in this pull request?

This PR refactors RocksDBPersistenceEngine to improve performance and operational flexibility by:

  1. Introducing dedicated Column Families (app_, worker_, driver_) for different metadata types.
  2. Optimizing the read operation from O(N_total) to O(N_type) by using type-specific Column Family iterators.
  3. Replacing expensive string-based prefix matching (new String(iter.key()).startsWith(...)) with a zero-allocation byte-level comparison helper.
  4. Implementing an automatic data migration path to move existing records from the default Column Family to their respective new Column Families upon startup.
  5. Ensuring proper resource management by overriding close() to release RocksDB handles and the database instance.

Why are the changes needed?

Previously, all metadata was stored in the default Column Family. This caused several issues:

  • Scan efficiency: Even when reading a specific type of data (e.g., Applications), the iterator had to be filtered via prefix checks across the entire keyspace.
  • Performance overhead: Every iteration involved creating a new String object from the byte array key for prefix verification, leading to significant GC pressure in metadata-heavy clusters.
  • Operational limits: Lack of granular configuration for different data types (e.g., Memtable size, compression strategy).

Does this PR introduce any user-facing change?

No. The migration logic ensures that existing persisted state is transparently moved to the new structure without data loss.

How was this patch tested?

  • Verified with existing Standalone Master recovery tests.
  • Manual verification of data migration from legacy single-CF RocksDB instances.

…ilies and zero-allocation prefix matching

This PR refactors RocksDBPersistenceEngine to improve performance and operational flexibility by:
1. Introducing dedicated Column Families (app_, worker_, driver_) for different metadata types.
2. Optimizing the read operation from O(N_total) to O(N_type) by using type-specific Column Family iterators.
3. Replacing expensive string-based prefix matching (new String(iter.key()).startsWith(...)) with a zero-allocation byte-level comparison helper.
4. Implementing an automatic data migration path to move existing records from the default Column Family to their respective new Column Families upon startup.
5. Ensuring proper resource management by overriding close() to release RocksDB handles and the database instance.

Previously, all metadata was stored in the default Column Family. This caused several issues:
- Scan efficiency: Even when reading a specific type of data (e.g., Applications), the iterator had to be filtered via prefix checks across the entire keyspace.
- Performance overhead: Every iteration involved creating a new String object from the byte array key for prefix verification, leading to significant GC pressure in metadata-heavy clusters.
- Operational limits: Lack of granular configuration for different data types (e.g., Memtable size, compression strategy).

No. The migration logic ensures that existing persisted state is transparently moved to the new structure without data loss.

- Verified with existing Standalone Master recovery tests.
- Manual verification of data migration from legacy single-CF RocksDB instances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant