ListView Handling normalization #6470
Replies: 4 comments
-
|
I believe there is a third option that we have brought up in the past where we do not store To be honest, I am still in support of storing this flag in metadata. If this is somehow a terrible idea and there is actually a better way of converting a CC @robert3005 @joseph-isaacs since I remember we talked about this several months ago. |
Beta Was this translation helpful? Give feedback.
-
|
How did this come up? I am not sure that we will always choose List, only of the ListView is list-like. @asubiotto uses compressed ListView where element are referenced more than one in a ListView. Also for datafusion scanning there will also be a target data_type? Compact should likely try to compress listview to lists too? Regarding ListView metadata, shall we roll that in with the ListLayout work we will conduct soon? |
Beta Was this translation helpful? Give feedback.
-
|
I think the fix here is to properly chunk our arrays and store listviews, then on read we can skip reading sizes if we can somehow plumb information that we are reading into list. |
Beta Was this translation helpful? Give feedback.
-
|
AFAIU #6322 removed the compact compressor so we always go through If this is related a perf issue on 0.58.0, I think it might be worth to check if it still exists on latest |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, Vortex uses ListView as its canonical List encoding data type. That means internally, lists are always treated as ListView.
We currently have a weird mismatch between how ListView's are written and how
execute_arrow()works for them.By default at write, we have two compressor pathways
Listto the fileListViewto the fileNote that the file metadata for ListViewArray does not contain any information about if the ListView is compatible zero-copy with a List memory layout.
The
ListViewArray::execute_arrowfunction if no Arrow DataType is provided, will emitListArray. This means that users who write data with Compact compressor and try and read it out as Arrow (which is a very common codepath) actually get the worst of all worlds performance, because ListView will go through thenaive_rebuildpathway.There are two options that come to mind for how we fix performance of the write(ListView) + read(List) use-case:
is_zero_copy_to_listmetadata field in the flatbuffer for ListView so that we know to skip the path. This could potentially lead to malicious behavior if we e.g. build aListArray::new_uncheckedb/c we assume some correctness property about this flag that is not trueBeta Was this translation helpful? Give feedback.
All reactions