Skip to content

get_call_graph() omits PyCallable nodes whose outgoing edges all target unresolvable callees #30

@rahlk

Description

@rahlk

Summary

PythonAnalysis.get_call_graph() does not include a node for a method
when every outgoing call site of that method targets a callable the
backend cannot resolve (typically calls into a third-party package not
installed in the analysis environment, or attribute chains on objects
whose type Jedi cannot infer). The method is still present in
get_methods(), but it has no node in the call graph, so any reachability
analysis that starts from a get_methods() qname and tries
call_graph.has_node(qn) returns False.

The expected invariant — "every method known to the symbol table has a
node in the call graph (possibly with zero edges)"
— does not hold.
Reachability/PoV tooling cannot distinguish "method does not exist" from
"method exists but has no resolvable outgoing edges," and is forced to
either over-prune (drop) or treat every unresolvable as a quarantined
ambiguity.

Reproducer

# pkg/m.py
from some_external_pkg_we_cannot_resolve import api  # unresolvable import

def helper(x):
    return {"id": x}

class Controller:
    def only_external_chain(self, note_id):
        # Only calls on `api` and on what it returns — Jedi can't resolve them.
        note = api.env["foo"].sudo().browse(note_id)
        return {"id": note.id, "name": note.name}

    def with_internal_helper(self, note_id):
        # Same external chain, but also calls a module-level function.
        note = api.env["foo"].sudo().browse(note_id)
        return helper(note.id)
from cldk import CLDK
pa = CLDK(language="python").analysis(
    project_path="pkg", cache_dir="./.cache",
    eager=True, use_codeql=True,
)
ms = pa.get_methods()
cg = pa.get_call_graph()
present = set(cg.nodes())
for mn in ms["m.Controller"]:
    qn = f"m.Controller.{mn}"
    print(f"  {qn:40s} in_cg={qn in present}")

Output:

  m.Controller.only_external_chain         in_cg=False
  m.Controller.with_internal_helper        in_cg=True

Expected

Both only_external_chain and with_internal_helper should appear as
nodes in the call graph (with zero outgoing local edges in the
only_external_chain case). They are both in get_methods() and are
both reachable from a framework decorator (@http.route in the
real-world case below).

Real-world impact

This was hit during a triage of the Odoo monorepo. The Odoo addon
crm_note_manager is a deliberately vulnerable A01 test addon. Two of
its @http.route(...) methods — get_note_summary and
get_public_note_summary — are exactly the worst-bug shape: an
auth='public' route calling
request.env[...].sudo().browse(note_id) with no access check. Both
are in the symbol table:

controllers.controllers.CrmNoteController.get_note_summary
  path=/.../crm_note_manager/controllers/controllers.py
  lines=207-216
  decorators=["http.route('/crm/notes/data/public_note_summary/<int:note_id>',
              type='jsonrpc', auth='public')"]

…but neither is in get_call_graph().nodes() (the shard has 22 methods,
24 CG nodes; the two auth='public' + only-external-chain methods are
the missing ones).

Because reachability analysis is rooted at cg.has_node(sink), the
critical bug bucket is silently classified as SINK_UNREACHABLE and
dropped. Sibling methods in the same file that also call a
module-level free function (serialize_note(...)) are correctly
included.

Workaround used downstream

The triage pipeline now treats this case as an Invariant-I1 tool gap:
if a method is in get_methods() but absent from get_call_graph(),
the corresponding bucket is quarantined (status=quarantined,
reason=CLDK_CG_MISSING_NODE) and surfaces as partially_confirmed /
needs_review instead of being dropped. This avoids silent false
negatives but pushes work onto the human reviewer.

Environment

  • cldk 1.1.3
  • codeanalyzer-python 0.1.14
  • Python 3.12.10
  • Linux x86_64, CodeQL CLI 2.25.5
  • use_codeql=True, eager=True

Suggested fix

In the call-graph build (semantic_analysis/... and the
get_call_graph accessor in python_analysis.py), seed the graph with
every callable from the symbol table as a zero-edge node before adding
the resolved edges. That preserves the
"every callable is a CG node" invariant downstream tools rely on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions