Skip to content

1699: get_parent_model_column_order calculation made only for unique rdomains and merged by#1711

Merged
SFJohnson24 merged 3 commits into
mainfrom
1699-qnam-optimize
Apr 30, 2026
Merged

1699: get_parent_model_column_order calculation made only for unique rdomains and merged by#1711
SFJohnson24 merged 3 commits into
mainfrom
1699-qnam-optimize

Conversation

@alexfurmenkov
Copy link
Copy Markdown
Collaborator

@alexfurmenkov alexfurmenkov commented Apr 28, 2026

No description provided.

@alexfurmenkov alexfurmenkov linked an issue Apr 28, 2026 that may be closed by this pull request
@alexfurmenkov alexfurmenkov marked this pull request as ready for review April 28, 2026 18:30
@alexfurmenkov alexfurmenkov changed the title get_parent_model_column_order calculation made only for unique rdomai… 1699: get_parent_model_column_order calculation made only for unique rdomains and merged by Apr 28, 2026
Copy link
Copy Markdown
Collaborator

@SFJohnson24 SFJohnson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfortunately not getting the hang up described in the issue locally-- that said, the changes you made dont appear to impact the performance. The original was O(n) for the number of supp-- rows since _get_parent_variable_names_list was called once per unique RDOMAIN value via set(rdomain_values) loop. Your refactor is also O(n) since the setdefault call inside the generator still iterates over all n rows of rdomain_values and the expensive _get_parent_variable_names_list call is still only made once per unique RDOMAIN value, just lazily rather than upfront. I may be off here if you were able to reproduce the hangup? I still believe the bottleneck is likely the is_contained_by as it is O(n*m) presently -- number of rows (64k in this case) x the length of the comparator list. It may not be the iloc but that adds extra overhead of an already expensive loop.

  • I think reducing n by using a distinct
  • we could also use a set on comparison_data before the loop so each is_in check is O(1) instead of O(m)

What do you think?

@alexfurmenkov
Copy link
Copy Markdown
Collaborator Author

The setdefault approach inside a generator does not actually cache the results of _get_parent_variable_names_list. In Python, function arguments are evaluated eagerly — meaning the second argument to setdefault is fully evaluated before setdefault is even called. So _get_parent_variable_names_list gets called once per row (n times total), not once per unique RDOMAIN value.

d = {}
def expensive(x):
    print(f"called with {x}")
    return x * 10

for v in ["A", "A", "A"]:
    d.setdefault(v, expensive(v))
# Output:
# called with A
# called with A
# called with A   ← 3 times, not 1

So the original setdefault version is O(n rows), while the explicit set() loop version:

for rdomain in set(rdomain_values):
    rdomain_names_list[rdomain] = self._get_parent_variable_names_list(...)

is O(k unique RDOMAIN values), which is a real improvement, not just a cosmetic refactor.

Copy link
Copy Markdown
Collaborator

@SFJohnson24 SFJohnson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR correctly preserves get_parent_model_column_order while optimizing it for performance, going from O(n) I/O calls down to O(u) by deduplication with set(). Neither dev was able to reproduce the hangup using the pilot data for both the full SDTMIG and just the one problematic rule. Should the hangup persist in v0.16 we will look to is_contain_by and work to optimize there

@SFJohnson24 SFJohnson24 merged commit f69dafb into main Apr 30, 2026
12 checks passed
@SFJohnson24 SFJohnson24 deleted the 1699-qnam-optimize branch April 30, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CORE-000783

2 participants