Is your feature request related to a problem or challenge?
Currently the DictionaryGroupValues path is faster than GroupValuesRows, but there is still room for improvement. seen_elements stores the raw bytes of each element as a Vec within a Vec. The frequent allocations this causes are minor but do show up as CPU spend in intern(). The current collision handling also forces a copy: bytes are stored in both seen_elements and unique_dict_value_mapping.
Describe the solution you'd like
This can be resolved by storing intermediate bytes in a single contiguous buffer, then tracking offsets and lengths instead of raw bytes. We'd introduce a new field on the struct that holds the buffer, and seen_elements / unique_dict_value_mapping would only need to store an offset and length per entry. This would replace a potentially large byte copy with two i32s.
Describe alternatives you've considered
the alternative is to not change anything. benchmarks show that even with the current approach its faster than the default GroupValuesRow approach.
Additional context
see #21765

Is your feature request related to a problem or challenge?
Currently the DictionaryGroupValues path is faster than GroupValuesRows, but there is still room for improvement. seen_elements stores the raw bytes of each element as a Vec within a Vec. The frequent allocations this causes are minor but do show up as CPU spend in intern(). The current collision handling also forces a copy: bytes are stored in both seen_elements and unique_dict_value_mapping.
Describe the solution you'd like
This can be resolved by storing intermediate bytes in a single contiguous buffer, then tracking offsets and lengths instead of raw bytes. We'd introduce a new field on the struct that holds the buffer, and seen_elements / unique_dict_value_mapping would only need to store an offset and length per entry. This would replace a potentially large byte copy with two i32s.
Describe alternatives you've considered
the alternative is to not change anything. benchmarks show that even with the current approach its faster than the default GroupValuesRow approach.
Additional context
see #21765