Skip to content

Client side memory leak on connect/disconnect cycles in fbclient.dll #9014

@jojastahl

Description

@jojastahl

Platform: Windows (other platforms not tested, probably platform agnostic)
Component: fbclient.dll (client library only — reproduced without any server-side change)
Affected versions: 3.0.8 and later (confirmed 3.0.10, 3.0.14); identical code present in v4.0.7 and v5.0.4 — assumed affected
Clean versions: 3.0.4, 3.0.7 (bisection)

Root cause

The bug is a one-liner in src/remote/protocol.cpp, function xdr_response() (introduced between 3.0.7 and 3.0.8 in the CORE-6525 backport):

static inline bool_t xdr_response(XDR* xdrs, CSTRING* cstring)
{
    if (xdr_is_client(xdrs) && xdrs->x_op == XDR_DECODE && cstring->cstr_allocated)
    {
        ULONG limit = cstring->cstr_allocated;
        cstring->cstr_allocated = 0;   // <------------------ BUG
        return xdr_cstring_with_limit(xdrs, cstring, limit);
    }
    return xdr_cstring(xdrs, cstring);
}

The intent is probably buffer reuse in alloc_cstring(), however setting cstr_allocated = 0 also disables CSTRING::free():

static bool alloc_cstring(XDR* xdrs, CSTRING* cstring)
{
    ...
    if (cstring->cstr_length > cstring->cstr_allocated && cstring->cstr_allocated)
        cstring->free(xdrs);                        // ^^^^^^^^^^^^^^^^^^^^^^^^^^ cstr_allocated == 0 ensures free() is not called
    ...
}
void CSTRING::free(XDR* xdrs)
{
    if (cstr_allocated)          // guards the delete[]
    {
        delete[] cstr_address;
        ...
    }
    cstr_address = NULL;
    cstr_allocated = 0;
}

Proposed fix

Remove the zeroing. The limit argument can be read directly from the still-intact cstr_allocated:

diff --git a/src/remote/protocol.cpp b/src/remote/protocol.cpp
index fafd20dfba..0b0eb7322b 100644
--- a/src/remote/protocol.cpp
+++ b/src/remote/protocol.cpp
@@ -986,11 +986,7 @@ static inline bool_t xdr_cstring_const(XDR* xdrs, CSTRING_CONST* cstring)
 static inline bool_t xdr_response(XDR* xdrs, CSTRING* cstring)
 {
        if (xdr_is_client(xdrs) && xdrs->x_op == XDR_DECODE && cstring->cstr_allocated)
-       {
-               ULONG limit = cstring->cstr_allocated;
-               cstring->cstr_allocated = 0;
-               return xdr_cstring_with_limit(xdrs, cstring, limit);
-       }
+               return xdr_cstring_with_limit(xdrs, cstring, cstring->cstr_allocated);

        return xdr_cstring(xdrs, cstring);
 }

With this fix, the buffer should still be reused if large enough in the first place:

if (cstring->cstr_length > cstring->cstr_allocated && cstring->cstr_allocated)
{
cstring->free(xdrs);
}

We have applied this patch locally and in our testcase that fixed the leak. Production verification still outstanding.

Background

We run a long running production application on Windows that connects to Firebird repeatedly throughout the day. After migrating from fbclient 3.0.4 to 3.0.10 we observed sustained memory growth in the application process. A heap inspection showed 64 KB private memory blocks packed with the strings "Symmetric" and "Arc4", traceable to fbclient.dll. Reverting to fbclient 3.0.4 eliminated the problem immediately.

At the time we started working on a standalone reproduction but never reached a clean enough result to report. Recently we had to move to 3.0.14 due to security fixes and resumed the investigation.

Reproduction and evidence

Minimal reproduction

Pure connect/disconnect loop using isql against a remote Firebird server (localhost).
No queries are executed. Two databases are alternated to avoid any single-connection caching effect. (May not be required, single DB reconnects not tested.)

/* auto-generated, repeated ~4000 times, feed into isql */
CONNECT 'localhost:C:\db\DBA1.fdb' USER 'SYSDBA' PASSWORD 'masterkey';
CONNECT 'localhost:C:\db\DBA2.fdb' USER 'SYSDBA' PASSWORD 'masterkey';

Version results:

fbclient Leak?
3.0.4 No
3.0.7 No
3.0.8 Yes — first affected version (bisection)
3.0.10 Yes
3.0.14 Yes

Strategy

We instrumented several constructors / destructors with counters, we added free-poisoning for freed blocks, and tried several other things, stack dumps etc.

Signal 1 — heap dump

A live scan of the isql address space taken mid-run shows two 64 KB regions packed wall-to-wall with 48-byte blocks containing the string "Symmetric" at a fixed stride of exactly 0x30 (48) bytes. With free-poison active (freed block bodies overwritten with 0xDE), these cannot be freed-block residue — they are live, unreleased allocations. The "Arc4" string appears in the same regions at matching offsets.

Signal 2 — pool histogram (4000 connects)

bucket[ 6] (32..63 B): allocs=84063  frees=80007  net_live=+4056

Approximately one 48-byte allocation leaks per connect. All other size buckets are near-zero net and consistent with one-time startup costs.

Signal 3 — live-block pool walker

A pool walker dumps all live bucket-6 block bodies at process exit. After 4000 connects the leaked blocks all contain the same 17-byte payload:

00 09 53 79 6D 6D 65 74 72 69 63 01 04 41 72 63 34 DE DE...
       S  y  m  m  e  t  r  i  c        A  r  c  4

This is a Firebird clumplet:

  • tag 00, length 09, value "Symmetric";
  • tag 01, length 04, value "Arc4"

— exactly the server wire-crypt key announcement sent in p_resp.p_resp_data during the SRP handshake.
Both strings live in the same 48-byte block; their appearance at matching strides in the heap dump is the same single allocation viewed twice.

All tracked per-connection objects are balanced

Object ctor dtor
rem_port 4000 4000
ClntAuthBlock 4000 4000
InternalCryptKey 4000 4000
KnownServerKey 8000 8000
Crypt::Arc4 plugin instance 4000 4000
FactoryParameter 16000 16000

The leak is therefore not in any named object which has lead us to the raw UCHAR[] buffer allocated by alloc_cstring() for p_resp_data.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions