the most efficient LOB fetching #271
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
tl;dr;
Update WARNING: this is a breaking change for users of
SELECT ... FOR UPDATElike oracle enhanced adapter. I'm looking to submit an update there to directly bindOCI8::BLOB.new()andOCI8::CLOB.new()to the queries but it presently would need a monkey patch. Look at my comment in #230, I'll post that there when ready.For the brave reader:
Earlier commit series introduced static memory array allocation for fetching results whenever LOBs were present. That was supposed to reduce LOB fetching database roundtrips. But caused massive memory usage increase #230, as well I didn't see evidence of it actually improving the performance of LOB fetching in any way.
Warning: feel welcome to verify my claims and correct me if needed, because I'm not a C programmer so most of the root cause explanation and benchmarks you see below came from AI and it may not be fully accurate and I have not verified everything, but I have validated that the new approach works faster from the ruby side which is the end goal, but I want to share more things for future reference of for the curious.
The memory issue, as far as I understand it comes from the fact that with the approach where ruby-oci8 2.2.7 allocates memory enough to hold all the configured prefetch rows (whether the query returns so many columns or not) does not allocate more memory only for the LOB fields but for all fields of the query. Then the LOB fields are still not fit into that pre-allocated memory buffer but still need a separate
OCILobRead2call, but more on that later. The memory issue comes from the fact that thecursorwill now hold all this memory allocated although the ruby side has already created the necessary ruby objects. But now if a higher level like the oracle-enhanced_adapter caches these cursors as prepared queries, we end up with a lot of allocated memory sitting around with no good reason.Additionally I saw no evidence that without an additional
OCI_ATTR_DEFAULT_LOBPREFETCH_SIZEfor the connection, any LOBs will be actually prefetched so I'm not sure the roundtrips would be reduced with this approach.The solution I found in oracle/odpi#163 - to request Oracle sending LOBs as LONG and using dynamic piecewise allocation with a callback to fetch LOBs while fetching all other fields as normally by allocating memory for only 1 row, allowing the lower level OCI driver handle the prefetch buffer and we take them row by row. Shows to be at least 2-5 times faster that any other approach.
That method though is limited to ~2GB LOBs. To accommodate users who need more, the previous way of using memory allocation for 1 row and fetching lobs individually remains available and can be optionally enabled. In such case prefetching is still performed by oracle OCI layer. Optionally on can specify
OCI_ATTR_DEFAULT_LOBPREFETCH_SIZEbut my testing showed only performance degradation. But I have only tested with a local oracle server, so connecting over a slower network may yield different results. On the other hand I don't see one using huge LOBs over a slow network...Any way, some performance results. This one is the most important probably. See it and instructions how to run here https://gist.github.com/akostadinov/4e69b493e8413a0779628a8f0abfbe85
This is the most shocking finding for me - that setting
OCI_ATTR_DEFAULT_LOBPREFETCH_SIZEactually hurts performance but as I said, it may help on some networks I assume. So it is 0 by default but can be set in case it helps you use case.Some more tests that use oracle OCI directly and are AI generated, sanitized minimally by me so the chance of AI slop is high but wanted to put it here for reference and maybe somebody wants to review and replicate the results.
lobtest.zip
BUILD
MEASURE fetching as LONG
MEASURE simple fetching as LOB without LOB prefetch buffer
MEASURE simple fetching as LOB with LOB prefetch buffer
MEASURE fetching with static memory allocation as LOB without LOB prefetch buffer
MEASURE fetching with static memory allocation as LOB with LOB prefetch buffer
Testing how large a LOB can be read by the LONG piecewise interface vs the normal locator read approach (warning: long output) (also you can add multiple sizes in kilobytes to try many sizes at once.
As you can see, for me the piecewise read was way faster than lob read even for large 100MB LOBs.
512MB LOB:
1.5GB LOB:
But with the caveat that above 2GB you can only use the locator LOB read approach.
I think that's all. Again, especially in the C test there might be AI slop so please double check my claims but I'm personally satisfied enough by the ruby performance difference I see as well by the simplification of the query code on the ruby-oci side.
-- please review commits individually