Skip to content

Conversation

@akostadinov
Copy link

@akostadinov akostadinov commented Dec 4, 2025

tl;dr;

Update WARNING: this is a breaking change for users of SELECT ... FOR UPDATE like oracle enhanced adapter. I'm looking to submit an update there to directly bind OCI8::BLOB.new() and OCI8::CLOB.new() to the queries but it presently would need a monkey patch. Look at my comment in #230, I'll post that there when ready.

For the brave reader:
Earlier commit series introduced static memory array allocation for fetching results whenever LOBs were present. That was supposed to reduce LOB fetching database roundtrips. But caused massive memory usage increase #230, as well I didn't see evidence of it actually improving the performance of LOB fetching in any way.

Warning: feel welcome to verify my claims and correct me if needed, because I'm not a C programmer so most of the root cause explanation and benchmarks you see below came from AI and it may not be fully accurate and I have not verified everything, but I have validated that the new approach works faster from the ruby side which is the end goal, but I want to share more things for future reference of for the curious.

The memory issue, as far as I understand it comes from the fact that with the approach where ruby-oci8 2.2.7 allocates memory enough to hold all the configured prefetch rows (whether the query returns so many columns or not) does not allocate more memory only for the LOB fields but for all fields of the query. Then the LOB fields are still not fit into that pre-allocated memory buffer but still need a separate OCILobRead2 call, but more on that later. The memory issue comes from the fact that the cursor will now hold all this memory allocated although the ruby side has already created the necessary ruby objects. But now if a higher level like the oracle-enhanced_adapter caches these cursors as prepared queries, we end up with a lot of allocated memory sitting around with no good reason.

Additionally I saw no evidence that without an additional OCI_ATTR_DEFAULT_LOBPREFETCH_SIZE for the connection, any LOBs will be actually prefetched so I'm not sure the roundtrips would be reduced with this approach.

The solution I found in oracle/odpi#163 - to request Oracle sending LOBs as LONG and using dynamic piecewise allocation with a callback to fetch LOBs while fetching all other fields as normally by allocating memory for only 1 row, allowing the lower level OCI driver handle the prefetch buffer and we take them row by row. Shows to be at least 2-5 times faster that any other approach.

That method though is limited to ~2GB LOBs. To accommodate users who need more, the previous way of using memory allocation for 1 row and fetching lobs individually remains available and can be optionally enabled. In such case prefetching is still performed by oracle OCI layer. Optionally on can specify OCI_ATTR_DEFAULT_LOBPREFETCH_SIZE but my testing showed only performance degradation. But I have only tested with a local oracle server, so connecting over a slower network may yield different results. On the other hand I don't see one using huge LOBs over a slow network...

Any way, some performance results. This one is the most important probably. See it and instructions how to run here https://gist.github.com/akostadinov/4e69b493e8413a0779628a8f0abfbe85

> Fetching:
> Fetched 10000 rows
> Total CLOB bytes: 100000000
> Total BLOB bytes: 100000000
> Mode                               Time (sec)
> Data insertion                     35.887
> LONG interface                      1.012
> LOB locator (no prefetch)           2.553
> LOB locator (with prefetch)         2.925
> -------------------------------------------------------------------
> Locator vs LONG (no prefetch)         2.5x
> Locator vs LONG (with prefetch)        2.9x

This is the most shocking finding for me - that setting OCI_ATTR_DEFAULT_LOBPREFETCH_SIZE actually hurts performance but as I said, it may help on some networks I assume. So it is 0 by default but can be set in case it helps you use case.

Some more tests that use oracle OCI directly and are AI generated, sanitized minimally by me so the chance of AI slop is high but wanted to put it here for reference and maybe somebody wants to review and replicate the results.

lobtest.zip

BUILD

make -f Makefile.bench # compiles the C executable

MEASURE fetching as LONG

🐚 ./measure_lob_syscalls.sh ./bench_lob_fetch long
Running: ./bench_lob_fetch long
Tracing syscalls...

Oracle socket: fd 4

Syscalls on Oracle socket:
  read(4, ...):   336
  write(4, ...):  113
  readv(4, ...):  0
  writev(4, ...): 0
  ================================
  TOTAL:                    449

MEASURE simple fetching as LOB without LOB prefetch buffer

🐚 ./measure_lob_syscalls.sh ./bench_lob_fetch locator
Running: ./bench_lob_fetch locator
Tracing syscalls...

Oracle socket: fd 4

Syscalls on Oracle socket:
  read(4, ...):   536
  write(4, ...):  313
  readv(4, ...):  677
  writev(4, ...): 0
  ================================
  TOTAL:                    1526

MEASURE simple fetching as LOB with LOB prefetch buffer

🐚 ./measure_lob_syscalls.sh ./bench_lob_fetch prefetch
Running: ./bench_lob_fetch prefetch
Tracing syscalls...

Oracle socket: fd 4

Syscalls on Oracle socket:
  read(4, ...):   436
  write(4, ...):  113
  readv(4, ...):  0
  writev(4, ...): 0
  ================================
  TOTAL:                    549

MEASURE fetching with static memory allocation as LOB without LOB prefetch buffer

🐚 ./measure_lob_syscalls.sh ./bench_lob_fetch array
Running: ./bench_lob_fetch array
Tracing syscalls...

Oracle socket: fd 4

Syscalls on Oracle socket:
  read(4, ...):   440
  write(4, ...):  214
  readv(4, ...):  602
  writev(4, ...): 0
  ================================
  TOTAL:                    1256

MEASURE fetching with static memory allocation as LOB with LOB prefetch buffer

🐚 ./measure_lob_syscalls.sh ./bench_lob_fetch array_prefetch
Running: ./bench_lob_fetch array_prefetch
Tracing syscalls...

Oracle socket: fd 4

Syscalls on Oracle socket:
  read(4, ...):   410
  write(4, ...):  14
  readv(4, ...):  0
  writev(4, ...): 0
  ================================
  TOTAL:                    424

Testing how large a LOB can be read by the LONG piecewise interface vs the normal locator read approach (warning: long output) (also you can add multiple sizes in kilobytes to try many sizes at once.

🐚 ./bench_lob_fetch test_sizes 102400

=======================================================================
LOB SIZE LIMIT TEST
=======================================================================

Testing LOB sizes: 102400KB

[SETUP] Dropping test table if exists...
[SETUP] Creating test table...
<...>
  [LONG] Success - read 104857600 bytes via dynamic callback
  [LONG] Elapsed time: 0.156 seconds
  [LOCATOR] Testing LOB locator with streaming...
  [LOCATOR] LOB length: 104857600 bytes
SUCCESS (streamed)  
  [LOCATOR] Success - read 104857600 bytes
  [LOCATOR] Elapsed time: 0.555 seconds
  [CLEANUP] Deleting test row...
  [CLEANUP] Row deleted

[CLEANUP] Dropping test table...
[CLEANUP] Table dropped successfully

As you can see, for me the piecewise read was way faster than lob read even for large 100MB LOBs.

512MB LOB:

  • LONG interface: 1.166 seconds (536,870,912 bytes) ✅
  • LOB locator: 3.593 seconds (536,870,912 bytes) ✅
  • Performance: LONG is 3.1x faster

1.5GB LOB:

  • LONG interface: 3.459 seconds (1,610,612,736 bytes) ✅
  • LOB locator: 12.048 seconds (1,610,612,736 bytes) ✅
  • Performance: LONG is 3.5x faster

But with the caveat that above 2GB you can only use the locator LOB read approach.

I think that's all. Again, especially in the C test there might be AI slop so please double check my claims but I'm personally satisfied enough by the ruby performance difference I see as well by the simplification of the query code on the ruby-oci side.

-- please review commits individually

…e array fetching for object types."

This reverts commit 1fe7ea3.
… of network round trips from two to one."

This reverts commit 5969e34.
…ct types are in the query."

This reverts commit a35c64d.
@akostadinov akostadinov changed the title Most efficient LOB fetching the most efficient LOB fetching Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Huge memory usage starting with version 2.2.7

1 participant