Fix loc/iloc and groupby column selection to preserve weights #266

MaxGhenis · 2025-11-25T14:32:33Z

Summary

Fix MicroDataFrame.loc[] and .iloc[] to return MicroDataFrame with properly filtered weights instead of plain pandas.DataFrame (fixes MicroDataFrame.loc[] returns plain DataFrame, losing weights #265)
Fix MicroDataFrame.groupby(col)["y"].sum() and groupby(col)[["y"]].sum() to use weighted aggregation (fixes MicroDataFrame.groupby(col)[[cols]].aggfunc doesn't consider weights #193)

Changes

Issue #265 Fix

Added custom _MicroLocIndexer and _MicroILocIndexer classes that wrap pandas' indexers. These ensure that when filtering rows, the result is a MicroDataFrame with properly filtered weights:

df = MicroDataFrame({'x': [1, 2, 3]}, weights=[10, 20, 30])
subset = df.loc[df.x > 1]  # Now returns MicroDataFrame with weights [20, 30]
subset.x.sum()  # Returns 130 (weighted), not 5 (unweighted)

Issue #193 Fix

Added __getitem__ method to MicroDataFrameGroupBy class. When selecting columns after groupby, it now returns properly configured groupby objects with weights:

d = MicroDataFrame(dict(g=["a", "a", "b"], y=[1, 2, 3]), weights=[4, 5, 6])
d.groupby("g")["y"].sum()     # Now returns a=14, b=18 (weighted)
d.groupby("g")[["y"]].sum()   # Now returns a=14, b=18 (weighted)

Test plan

Added test_loc_preserves_weights() - tests .loc[] filtering
Added test_iloc_preserves_weights() - tests .iloc[] filtering
Added test_groupby_column_selection() - tests groupby column selection
All 26 tests pass
Code passes linting

🤖 Generated with Claude Code

- Add custom _MicroLocIndexer and _MicroILocIndexer classes that wrap pandas indexers and return MicroDataFrame with filtered weights - Add __getitem__ to MicroDataFrameGroupBy to support weighted operations on column selections like groupby("g")["y"].sum() - Add tests for loc, iloc, and groupby column selection Fixes #265, Fixes #193 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

baogorek

Hi @MaxGhenis , documentation was also broken in -us-data. I have a branch out if you want me to try to bring anything over.

For the fix, the values are looking good but I'm seeing NaN weights in certain circumstances:

from microdf import MicroDataFrame
In [8]: df = MicroDataFrame({'group': ['A', 'A', 'B', 'B'], 'value': [10, 20, 30, 40]}, weights=[1, 2, 3, 4])

In [9]: df.groupby("group").value.sum()
Out[9]: 
group
A     50.0
B    250.0
dtype: float64

In [10]: df.groupby("group")['value'].sum()
Out[10]: 
group
A     50.0
B    250.0
dtype: float64

In [11]: df.groupby("group").sum()
Out[11]: 
       weight  value
group               
A         NaN   50.0
B         NaN  250.0

In [13]: df.groupby("group")[["value"]].sum()
Out[13]: 
       weight  value
group               
A         NaN   50.0
B         NaN  250.0

I think we'd want to sum the weights, right?

…ights Aggregated results from groupby operations should be plain DataFrames, not MicroDataFrames, since the weights have already been applied during aggregation. This fixes the NaN weight column issue noted in review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

MaxGhenis · 2025-11-25T20:21:12Z

@baogorek Thanks for the review! I've fixed the NaN weights issue.

The problem was that groupby aggregation results were being returned as MicroDataFrame, which automatically adds a weights column. Since weights have already been applied during aggregation, the result should be a plain pd.DataFrame instead.

Now the output is clean:

from microdf import MicroDataFrame
df = MicroDataFrame({'group': ['A', 'A', 'B', 'B'], 'value': [10, 20, 30, 40]}, weights=[1, 2, 3, 4])

df.groupby("group").sum()
#        value
# group       
# A       50.0
# B      250.0

df.groupby("group")[["value"]].sum()
#        value
# group       
# A       50.0
# B      250.0

I've also added a test to verify aggregated results don't have a spurious weight column.

- Add myst.yml configuration for MyST-based documentation - Update CI workflow to use myst build instead of jb build - Add Node.js setup step (required by MyST) - Remove jupyter_book from pyproject.toml (MyST installs via npm) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

MaxGhenis · 2025-11-25T20:26:31Z

I've also fixed the documentation build by migrating to Jupyter Book 2.0 / MyST:

Added docs/myst.yml configuration
Updated CI workflow to install mystmd via npm and use myst build --html
Added Node.js setup step (required by MyST)

All CI checks are now passing! ✅

baogorek

LGTM!!

MaxGhenis requested a review from baogorek November 25, 2025 14:32

baogorek requested changes Nov 25, 2025

View reviewed changes

MaxGhenis and others added 2 commits November 25, 2025 15:24

Fix CI: Install mystmd via npm before building docs

b023ee6

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

MaxGhenis requested a review from baogorek November 25, 2025 20:30

baogorek approved these changes Nov 26, 2025

View reviewed changes

MaxGhenis merged commit 835e695 into master Nov 26, 2025
8 checks passed

MaxGhenis deleted the fix-issues branch November 26, 2025 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix loc/iloc and groupby column selection to preserve weights #266

Fix loc/iloc and groupby column selection to preserve weights #266

Uh oh!

MaxGhenis commented Nov 25, 2025

Uh oh!

baogorek left a comment

Uh oh!

MaxGhenis commented Nov 25, 2025

Uh oh!

MaxGhenis commented Nov 25, 2025

Uh oh!

baogorek left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix loc/iloc and groupby column selection to preserve weights #266

Fix loc/iloc and groupby column selection to preserve weights #266

Uh oh!

Conversation

MaxGhenis commented Nov 25, 2025

Summary

Changes

Issue #265 Fix

Issue #193 Fix

Test plan

Uh oh!

baogorek left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGhenis commented Nov 25, 2025

Uh oh!

MaxGhenis commented Nov 25, 2025

Uh oh!

baogorek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants