refactor: ExecuteResult is reusable, sampleable#2159
Conversation
tswast
left a comment
There was a problem hiding this comment.
At a high level, I think this makes a lot of sense. My main worry would be the case of ExecuteResult with local data. If we are keeping these around more, is there a risk of memory leaks?
| dfs = iter(map(self._copy_index_to_pandas, dfs)) | ||
|
|
||
| total_rows = execute_result.total_rows | ||
| total_rows = result_batches.approx_total_rows |
There was a problem hiding this comment.
As we discussed offline, I think it'd be helpful to have cases where we know (or at least have very high confidence) of this being exact, such as when reading the destination table of a query.
There was a problem hiding this comment.
We do insert row count directly in the tree when known exactly (or at least some of the time we know exactly, might be missing some opportunities). At a certain point we just intermingle known and approx though, before surfacing. Could keep separate if really needed, but a bit more complexity.
Yeah, this refactor is intended to help with caching, but doesn't actually add any caching behavior for now. So, these data handles will be garbage collected just fine. |
4032364 to
1cdf793
Compare
| for scan_item in node.scan_list.items: | ||
| if ( | ||
| scan_item.dtype == dtypes.JSON_DTYPE | ||
| node.source.schema.get_type(scan_item.source_id) == dtypes.JSON_DTYPE |
There was a problem hiding this comment.
I'm curious what the purpose of this extra layer of indirection is compared to the previous? I guess it's partially undoing some of the changes Shenyang and Chelsea made to plumb type info into the tree?
There was a problem hiding this comment.
its mostly from moving the schema to the source definition rather than the scan definition, which brings it more in line with the local source, where a source object is not just data, but interpretation (mostly matters for virtual types)
|
e2e failures are "Blob" related. Probably flakes. |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕