Skip to content

Conversation

@florian-jobs
Copy link

Summary

This PR introduces a new column group ColGroupDDCLZW that stores the mapping vector in LZW-compressed form.

Key design points

  • MapToData is not stored explicitly; only the compressed LZW representation is kept.
  • Operations that allow sequential access operate directly on _dataLZW without full decompression.
  • For complex or random-access patterns, the implementation falls back to DDC (uncompressed).

Current status

  • Core data structure and compression/decompression are in place.
  • Work in progress on operations that can be implemented via sequential decoding without full materialization.
  • Work in progress on Performance.

Feedback on design and integration is very welcome.

florian-jobs and others added 14 commits January 7, 2026 13:39
…extending on APreAgg like ColGroupDDC for easier implementation. Idea: store only compressed version of _data vector and important metadata. If decompression is needed we reconstruct the _data vector using the metadata and the compressed _data vector. Decompression takes place at most once. This is just an idea and theres other ways of implementing.
 * - DDCLZW stores the mapping vector exclusively in compressed form.
 * - No persistent MapToData cache is maintained.
 * - Sequential operations decode on-the-fly, while operations requiring random access explicitly materialize and fall back to DDC.
 */
…and decompress and its used data structures compatible.
…DC test for ColGroupDDCTest. Improved compress/decompress methods in LZW class.
…mapping

This commit adds an initial implementation of ColGroupDDCLZW, a new column
group that stores the mapping vector in LZW-compressed form instead of
materializing MapToData explicitly.

The design focuses on enabling sequential access directly on the compressed
representation, while complex access patterns are intended to fall back to
DDC. No cache or lazy decompression mechanism is introduced at this stage.
@github-project-automation github-project-automation bot moved this to In Progress in SystemDS PR Queue Jan 13, 2026
@florian-jobs florian-jobs changed the title Add ColGroupDDCLZW with LZW-compressed MapToData [SYSTEMDS-3779] Add ColGroupDDCLZW with LZW-compressed MapToData Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant