Use arrow_scan instead of arrow_scan_dumb to enable predicate / filter pushdown#393
Use arrow_scan instead of arrow_scan_dumb to enable predicate / filter pushdown#393jonathanswenson wants to merge 1 commit intoduckdb:mainfrom
Conversation
|
Curious if opinions have changed here around writing / running tests using arrow here. Not sure how best to test any of this without the arrow dependency. However, when building this manually and testing against it -- I can see that a simple program seems to do what I expect when planning. test programBefore this change (using duckdb 1.3.2.0) I get the following output: After this change (with duckdb built from this branch) I get the following: However, it looks like it is not actually applying the filter... likely missing something here. |
|
Closing for now while I figure it out what I'm missing. |
|
Turns out this is lot more complicated than I thought 🤦🏻 Might be possible, but I was definitely missing the nuance of how the python implementation worked. Likely the implementation would have to mimic what is happening to create an arrow_scanner taking the filters / projections into account. https://github.com/duckdb/duckdb-python/blob/main/src/duckdb_py/arrow/arrow_array_stream.cpp#L37-L66 Maybe possible with gandiva or the native arrow code, but that's definitely a bit of a challenge that I didn't anticipate (but should have). |
Fixes #392