Skip to content

fix: add depth limit to EventSerializer to prevent hangs on complex objects#1577

Open
BKDDFS wants to merge 6 commits into
langfuse:mainfrom
BKDDFS:fix/serializer-depth-limit
Open

fix: add depth limit to EventSerializer to prevent hangs on complex objects#1577
BKDDFS wants to merge 6 commits into
langfuse:mainfrom
BKDDFS:fix/serializer-depth-limit

Conversation

@BKDDFS
Copy link
Copy Markdown

@BKDDFS BKDDFS commented Mar 26, 2026

Summary

EventSerializer.default() recursively traverses __dict__ and __slots__ of arbitrary objects without a depth limit. When @observe() captures function arguments containing objects like google.genai.Client (which hold aiohttp sessions, connection pools, and threading locks), json.dumps can block indefinitely.

Root cause

_serialize() in create_span_attributes calls json.dumps(obj, cls=EventSerializer). The serializer's __dict__ fallback recurses into every attribute of every object. For objects like aiohttp.ClientSessionTCPConnector → internal locks, this either hangs (lock contention / blocking attribute access) or enters extremely deep recursion.

Fix

Add a _MAX_DEPTH = 20 depth counter to EventSerializer:

  • default() increments/decrements _depth on each call
  • When _depth >= _MAX_DEPTH, returns <TypeName> placeholder instead of recursing
  • Serialization logic moved to _default_inner() — no behavior changes
  • encode() resets _depth at the start of each serialization

Workaround (current)

@observe(capture_input=False) — disables automatic input capture entirely.

Tests

3 new tests:

  • test_deeply_nested_object_does_not_hang — objects with connection pools/locks
  • test_max_depth_returns_type_name__dict__ deep nesting truncation
  • test_deeply_nested_slots_object_is_truncated__slots__ deep nesting truncation

All 21 tests pass (18 existing + 3 new).

Disclaimer: Experimental PR review

Greptile Summary

This PR fixes a real and reproducible hang in EventSerializer by introducing a _MAX_DEPTH = 20 counter that short-circuits recursion when serializing deeply nested objects (e.g. google.genai.Client holding aiohttp session internals). The core mechanism — incrementing _depth in default() and decrementing it in a finally block — is correct and well-structured.\n\nKey changes:\n- default() is now a thin depth-guard that delegates to _default_inner(), preserving all existing serialization behavior unchanged.\n- encode() resets _depth = 0 before each top-level call, which is a good defensive measure.\n- Three new tests cover the main scenarios: hang prevention, __dict__ depth truncation, and __slots__ depth truncation.\n\nIssues found:\n- The __slots__ branch at line 153–156 calls self.default({...}) for the intermediate dict it constructs. Because default() also increments _depth, each __slots__ object consumes 2 depth units per level instead of 1, cutting the effective limit from 20 to ~10 for __slots__-based objects. Changing that single call to _default_inner({...}) would make behavior consistent.\n- The test_deeply_nested_slots_object_is_truncated assertion bound (_MAX_DEPTH + 5 = 25) is too loose to catch regressions in truncation depth for the slots case.

Confidence Score: 4/5

Safe to merge — the hang is fixed and no existing behavior is regressed; the remaining issues are a minor depth inconsistency for __slots__ objects and a loose test assertion.

The fix correctly solves the stated problem (infinite recursion / hang on complex objects) and all 21 tests pass. The only issues are a double-depth-counting side-effect in the __slots__ path (effective limit ~10 instead of 20) and an overly permissive test assertion — neither affects the primary fix or causes correctness failures in normal usage.

langfuse/_utils/serializer.py lines 153–156 (__slots__ handling) and the corresponding test assertion at tests/test_serializer.py:267.

Important Files Changed

Filename Overview
langfuse/_utils/serializer.py Adds _MAX_DEPTH = 20 depth limiter to prevent infinite recursion/hangs; logic is correct for __dict__-based objects but __slots__ objects incur double depth consumption (consuming 2 units per level instead of 1), halving their effective limit to ~10.
tests/test_serializer.py Adds 3 new tests covering hang prevention, __dict__ truncation, and __slots__ truncation; the slots test has an overly loose assertion (_MAX_DEPTH + 5) that doesn't tightly verify the truncation depth.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["encode(obj)"] --> B["_depth = 0\nseen.clear()"]
    B --> C["default(obj)"]
    C --> D{{"_depth >= _MAX_DEPTH?"}}
    D -- Yes --> E["return '<TypeName>'"]
    D -- No --> F["_depth += 1\n_default_inner(obj)"]
    F --> G{{"Type of obj?"}}
    G -- "datetime / UUID / bytes / etc." --> H["Return primitive"]
    G -- "dict" --> I["self.default(k)\nself.default(v)\nfor each item"]
    G -- "list / Sequence" --> J["self.default(item)\nfor each item"]
    G -- "__slots__" --> K["self.default(dict of slots)\n⚠️ extra depth unit"]
    G -- "__dict__" --> L{{"id(obj) in seen?"}}
    L -- Yes --> M["return TypeName (circular ref)"]
    L -- No --> N["seen.add(id)\nself.default(v)\nfor each attr\nseen.remove(id)"]
    G -- "other" --> O["return '<TypeName>'"]
    I --> P["_depth -= 1\n(finally)"]
    J --> P
    K --> P
    N --> P
    H --> P
    M --> P
    O --> P
    P --> Q["return result to encode()"]
    Q --> R["super().encode(result)"]
Loading

Comments Outside Diff (1)

  1. langfuse/_utils/serializer.py, line 153-156 (link)

    P2 __slots__ path consumes double the depth budget per object level

    Because the __slots__ branch calls self.default({...}) for the intermediate dict, each __slots__ object consumes 2 depth units (one for the object itself, one for the constructed dict), whereas a __dict__-based object consumes only 1. In practice this halves the effective depth limit for __slots__ objects: they truncate at approximately 10 levels of nesting rather than the intended 20.

    Tracing for a chain of SlotLevel objects:

    • default(SlotLevel_N)_depth = 2*(N-1)
    • default({"child": SlotLevel_{N+1}})_depth = 2*(N-1)+1
    • Cutoff: 2*(N-1) ≥ 20 → N ≥ 11 (truncates at level 11, not 21)

    A lightweight fix is to call _default_inner directly for the constructed dict rather than routing it back through default():

    if hasattr(obj, "__slots__"):
        return self._default_inner(
            {slot: getattr(obj, slot, None) for slot in obj.__slots__}
        )

    This avoids the redundant depth check and gives __slots__ objects the same effective depth limit as __dict__ objects.

Reviews (1): Last reviewed commit: "fix: add depth limit to EventSerializer ..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 26, 2026

CLA assistant check
All committers have signed the CLA.

Comment thread tests/test_serializer.py Outdated
@BKDDFS BKDDFS force-pushed the fix/serializer-depth-limit branch from a895aed to 087052e Compare March 26, 2026 14:23
…bjects

EventSerializer.default() recursively traverses __dict__ and __slots__
of arbitrary objects without a depth limit. When @observe() captures
function arguments containing objects like google.genai.Client (which
hold aiohttp sessions, connection pools, and threading locks),
json.dumps blocks indefinitely on the second invocation.

Add a _MAX_DEPTH=20 counter that returns a <TypeName> placeholder
when exceeded, preventing infinite recursion into complex object graphs
while preserving all existing serialization behavior.
@BKDDFS BKDDFS force-pushed the fix/serializer-depth-limit branch from 087052e to 43114a6 Compare March 26, 2026 14:29
@BKDDFS
Copy link
Copy Markdown
Author

BKDDFS commented Apr 2, 2026

Hi @hassiebp, could you check my PR, please?

@hassiebp hassiebp self-requested a review April 7, 2026 15:08
@BKDDFS
Copy link
Copy Markdown
Author

BKDDFS commented Apr 15, 2026

friendly bump @hassiebp

@BKDDFS
Copy link
Copy Markdown
Author

BKDDFS commented May 15, 2026

Another friendly bump @hassiebp / @marcklingen. This is open for ~50 days now with no review.

The hang causes real impact in our prod: we ship @observe(capture_input=False) as a workaround on every endpoint that uses clients like google.genai.Client / aiohttp-backed clients, which means traces lose input data entirely on those flows. Switching to a depth-limited serializer would let us re-enable input capture.

Happy to:

  • rebase to resolve the conflict
  • address Greptile's __slots__ double-depth nit (one-line fix: route the slots dict through _default_inner instead of default)

if either is blocking the review.

@hassiebp
Copy link
Copy Markdown
Contributor

@claude review

@hassiebp
Copy link
Copy Markdown
Contributor

@BKDDFS Thanks a lot for your contribution! This looks good to me - could you kindly resolve the merge conflicts for this to get merged?

Comment on lines +44 to +54
self._depth = 0

def default(self, obj: Any) -> Any:
if self._depth >= self._MAX_DEPTH:
return f"<{type(obj).__name__}>"

self._depth += 1
try:
return self._default_inner(obj)
finally:
self._depth -= 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new depth guard in default() (lines 44-48) returns f"<{type(obj).__name__}>" for any object — including primitive strings — before any type analysis. Because the dict branch in _default_inner routes both keys and values through self.default() ({self.default(k): self.default(v) for k, v in obj.items()}), at the truncation boundary a normal string key like 'a' is replaced with the literal '<str>', silently corrupting user trace data. Fix: either move the depth check inside _default_inner after the primitive type checks, or change the dict branch to {k: self.default(v) for k, v in obj.items()} (matching the __dict__ branch convention on line 165).

Extended reasoning...

What the bug is

The depth guard added at the top of default() (lines 46-48) is unconditional — it runs before any type dispatch:

def default(self, obj: Any) -> Any:
    if self._depth >= self._MAX_DEPTH:
        return f"<{type(obj).__name__}>"
    ...

This means at the boundary, any value passed to default() — including plain str and int primitives — gets replaced by its type-name placeholder. Combined with the pre-existing dict branch in _default_inner (line 142-143):

if isinstance(obj, dict):
    return {self.default(k): self.default(v) for k, v in obj.items()}

which routes both keys and values through self.default(), the result is that dict keys at the truncation boundary are silently rewritten to "<str>" (or "<int>", etc.) instead of being preserved.

Step-by-step proof

Consider a chain of 22 nested {"a": {...}} dicts encoded by EventSerializer:

  1. encode(D_0) resets _depth=0, calls default(D_0).
  2. default(D_k) enters with _depth=k, checks k >= 20 (false for k<20), increments to k+1, calls _default_inner(D_k).
  3. The dict branch returns {self.default("a"): self.default(D_{k+1})} — both at _depth=k+1.
  4. At k=19: _default_inner calls self.default("a") while _depth=20. The guard _depth >= _MAX_DEPTH (20 >= 20) fires first, returning "<str>" instead of dispatching to the isinstance(obj, str) branch which would have returned "a".

Verified live output:

{"a": {"a": {"a": ... (19 levels) ... {"<str>": "<dict>"}}}}

The last surviving key — the most informative one in the trace — is the one that gets clobbered.

Why existing tests miss it

test_max_depth_returns_type_name uses a Level class with __dict__. That path goes through the __dict__ branch at line 165:

result = {k: self.default(v) for k, v in vars(obj).items()}

Note: keys are not routed through self.default() here. So the test never exercises the corrupted-key path that the raw-dict branch creates.

Impact

This is a regression introduced by this PR. Before the change, default("a") always hit the isinstance(obj, str) branch in _default_inner and returned "a" unchanged. Now, for any traced input/output containing a deeply nested dict (a common shape: nested JSON payloads, message threads, agent state), the key name at the truncation boundary is silently replaced with "<str>". Worse, "<str>" is indistinguishable from a legitimate string a user might have placed there. The __slots__ branch is also affected, because line 153-156 wraps slot names in a synthetic dict that gets routed through default() — so slot names get the same treatment.

How to fix

Two simple options:

  1. Move the depth check into _default_inner after the primitive type checks. Primitives (str, int, float, None, bytes, etc.) are leaves and cannot recurse, so they should never be subject to the depth guard.
  2. Match the __dict__ branch convention: change line 142-143 to {k: self.default(v) for k, v in obj.items()}. Keys are already required to be JSON-encodable (strings/ints), so routing them through default() was never necessary.

Comment on lines +44 to +54
self._depth = 0

def default(self, obj: Any) -> Any:
if self._depth >= self._MAX_DEPTH:
return f"<{type(obj).__name__}>"

self._depth += 1
try:
return self._default_inner(obj)
finally:
self._depth -= 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The __slots__ branch at lines 153-156 calls self.default({...}) on the synthesized dict, which re-enters the new depth-tracking wrapper and increments _depth a second time. As a result, every __slots__ object consumes 2 depth units per nesting level instead of 1, so the effective _MAX_DEPTH for __slots__-based objects (e.g. pydantic v1 BaseModels, many attrs/slotted dataclasses) is silently halved from 20 to ~10. One-line fix: call self._default_inner({...}) in the slots branch so it consumes depth equivalently to the __dict__ branch (which already builds its result dict directly without re-routing through default).

Extended reasoning...

What the bug is

The PR adds depth tracking by wrapping the existing serialization logic in a new default() method that increments _depth, calls _default_inner, and decrements in a finally. The pre-existing __slots__ branch at langfuse/_utils/serializer.py:153-156 was not updated to match the new wrapper-vs-inner split, so it still does:

if hasattr(obj, "__slots__"):
    return self.default(
        {slot: getattr(obj, slot, None) for slot in obj.__slots__}
    )

That self.default(...) call re-enters the wrapper and bumps _depth a second time for the same logical nesting level. The __dict__ branch a few lines below avoids this by building its result directly: {k: self.default(v) for k, v in vars(obj).items()} — only the child traversal increments depth, not the level itself.

Step-by-step proof

Consider a chain SL_0 -> SL_1 -> ... -> SL_N where each SlotLevel has __slots__ = ['child']:

Step Call _depth on entry After increment
1 default(SL_0) 0 1
2 inner sees __slots__default({'child': SL_1}) 1 2
3 inner sees dict → default(SL_1) (via value) 2 3
4 inner sees __slots__default({'child': SL_2}) 3 4
... ... ... ...

So default(SL_i) is entered with _depth = 2i. The truncation check _depth >= _MAX_DEPTH (20) fires at i = 10 — i.e. truncation happens at ~10 levels for __slots__ objects, vs. 20 for __dict__ objects. This is exactly what the PR's own new test bakes in (depth <= _MAX_DEPTH // 2 + 3 = 13), and Greptile flagged the same issue independently.

Why existing code doesn't prevent it

Before this PR, the redundant self.default(...) call in the __slots__ branch was harmless — default() had no per-call side effect. The PR introduces the per-call _depth increment but only modifies the __dict__ branch's data flow (which already bypassed default() for its own dict construction). The __slots__ branch was overlooked.

Impact

  • The documented _MAX_DEPTH = 20 is effectively 10 for any object using __slots__ (pydantic v1 BaseModels, many attrs classes, slotted dataclasses, etc.).
  • When truncation does trigger in a __slots__ chain, the placeholder becomes "<dict>" (the synthesized wrapper dict's type) instead of "<SlotLevel>" (the actual object type) — less informative for debugging.
  • Severity is moderate: the primary fix (preventing hangs) still works, but the depth budget is silently halved for a common class of objects, and the truncation marker is misleading.

How to fix

One-line change in the slots branch — call the inner helper directly so it matches the __dict__ branch's accounting:

if hasattr(obj, "__slots__"):
    return self._default_inner(
        {slot: getattr(obj, slot, None) for slot in obj.__slots__}
    )

The PR author has already offered to make this change ("address Greptile __slots__ double-depth nit — one-line fix: route the slots dict through _default_inner instead of default"). Once applied, the companion test bound (tests/test_serializer.py:267, currently _MAX_DEPTH // 2 + 3 = 13) should be tightened to _MAX_DEPTH + 2 (or restructured like test_max_depth_returns_type_name) so it doesn't encode the buggy behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants