-
Notifications
You must be signed in to change notification settings - Fork 10
Fix loc/iloc and groupby column selection to preserve weights #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add custom _MicroLocIndexer and _MicroILocIndexer classes that wrap
pandas indexers and return MicroDataFrame with filtered weights
- Add __getitem__ to MicroDataFrameGroupBy to support weighted operations
on column selections like groupby("g")["y"].sum()
- Add tests for loc, iloc, and groupby column selection
Fixes #265, Fixes #193
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
baogorek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaxGhenis , documentation was also broken in -us-data. I have a branch out if you want me to try to bring anything over.
For the fix, the values are looking good but I'm seeing NaN weights in certain circumstances:
from microdf import MicroDataFrame
In [8]: df = MicroDataFrame({'group': ['A', 'A', 'B', 'B'], 'value': [10, 20, 30, 40]}, weights=[1, 2, 3, 4])
In [9]: df.groupby("group").value.sum()
Out[9]:
group
A 50.0
B 250.0
dtype: float64
In [10]: df.groupby("group")['value'].sum()
Out[10]:
group
A 50.0
B 250.0
dtype: float64
In [11]: df.groupby("group").sum()
Out[11]:
weight value
group
A NaN 50.0
B NaN 250.0
In [13]: df.groupby("group")[["value"]].sum()
Out[13]:
weight value
group
A NaN 50.0
B NaN 250.0
I think we'd want to sum the weights, right?
…ights Aggregated results from groupby operations should be plain DataFrames, not MicroDataFrames, since the weights have already been applied during aggregation. This fixes the NaN weight column issue noted in review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@baogorek Thanks for the review! I've fixed the NaN weights issue. The problem was that groupby aggregation results were being returned as Now the output is clean: from microdf import MicroDataFrame
df = MicroDataFrame({'group': ['A', 'A', 'B', 'B'], 'value': [10, 20, 30, 40]}, weights=[1, 2, 3, 4])
df.groupby("group").sum()
# value
# group
# A 50.0
# B 250.0
df.groupby("group")[["value"]].sum()
# value
# group
# A 50.0
# B 250.0I've also added a test to verify aggregated results don't have a spurious weight column. |
- Add myst.yml configuration for MyST-based documentation - Update CI workflow to use myst build instead of jb build - Add Node.js setup step (required by MyST) - Remove jupyter_book from pyproject.toml (MyST installs via npm) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
I've also fixed the documentation build by migrating to Jupyter Book 2.0 / MyST:
All CI checks are now passing! ✅ |
baogorek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
Summary
MicroDataFrame.loc[]and.iloc[]to returnMicroDataFramewith properly filtered weights instead of plainpandas.DataFrame(fixes MicroDataFrame.loc[] returns plain DataFrame, losing weights #265)MicroDataFrame.groupby(col)["y"].sum()andgroupby(col)[["y"]].sum()to use weighted aggregation (fixes MicroDataFrame.groupby(col)[[cols]].aggfunc doesn't consider weights #193)Changes
Issue #265 Fix
Added custom
_MicroLocIndexerand_MicroILocIndexerclasses that wrap pandas' indexers. These ensure that when filtering rows, the result is aMicroDataFramewith properly filtered weights:Issue #193 Fix
Added
__getitem__method toMicroDataFrameGroupByclass. When selecting columns after groupby, it now returns properly configured groupby objects with weights:Test plan
test_loc_preserves_weights()- tests.loc[]filteringtest_iloc_preserves_weights()- tests.iloc[]filteringtest_groupby_column_selection()- tests groupby column selection🤖 Generated with Claude Code