feat: add `GroupBy.__iter__` by tswast · Pull Request #1394 · googleapis/python-bigquery-dataframes

tswast · 2025-02-13T19:26:09Z

Note: this is a work in progress. We have two choices for the interface, and I find myself flip flopping between the two:

Return an iterable of pandas objects, similar to to_pandas_batches(). To make sure we end up with all the rows together, this would mean (a) create a struct of all non-grouped fields and (b) array_agg those structs and (c) unpack those arrays and structs into DataFrame objects locally.
Return an iterable of bigframes objects, each filtered to match rows belonging to the corresponding group. This would involve running a query and iterating through the results to get all the key values and then for each key value, return a DataFrame (or Series) with the corresponding filter attached.

After internal discussion, we settled on (2) because the groups could potentially be quite large.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes internal bug 383638782 🦕

review-notebook-app · 2025-02-14T22:04:13Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tswast · 2025-09-18T18:12:17Z

bigframes/core/groupby/dataframe_group_by.py

            )
        )

+    def __iter__(self) -> Iterable[Tuple[blocks.Label, df.DataFrame]]:


This is very similar to series group by. I'll see if I can refactor.

Refactor complete. :-)

tswast · 2025-09-18T21:01:01Z

e2e failures:

FAILED tests/system/large/blob/test_function.py::test_blob_image_resize_to_series
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_df_where_mask_series
FAILED tests/system/large/blob/test_function.py::test_blob_pdf_chunk[True] - ...
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_with_connection[bq_connection]
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_series_apply_array_output
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_df_apply_axis_1_aggregates
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_array_output
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_dataframe_apply_axis_1
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_df_where_mask
FAILED tests/system/large/blob/test_function.py::test_blob_image_resize_to_folder
FAILED tests/system/large/blob/test_function.py::test_blob_pdf_chunk[False]
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_options
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_series_combine
FAILED tests/system/large/blob/test_function.py::test_blob_pdf_extract[False]
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_dataframe_apply_axis_1_array_output
FAILED tests/system/large/blob/test_function.py::test_blob_image_blur_to_folder
FAILED tests/system/large/functions/test_managed_function.py::test_managed_function_series_apply_args

These appear unrelated to this change.

tswast · 2025-09-18T21:02:03Z

doctest failure:

__ [doctest] third_party.bigframes_vendored.pandas.core.frame.DataFrame.join ___
[gw4] linux -- Python 3.12.7 /tmpfs/src/github/python-bigquery-dataframes/.nox/doctest/bin/python
4666             >>> df1.join(df2, how="inner")
4667                col1  col2 col3  col4
4668             11  bar     2  foo     3
4669             <BLANKLINE>
4670             [1 rows x 4 columns]
4671 
4672 
4673         Another option to join using the key columns is to use the on parameter:
4674 
4675             >>> df1.join(df2, on="col1", how="right")
UNEXPECTED EXCEPTION: TypeError('Cannot coerce string and Int64 to a common type.')
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/doctest.py", line 1368, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest third_party.bigframes_vendored.pandas.core.frame.DataFrame.join[11]>", line 1, in <module>
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 195, in wrapper
    raise e
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 180, in wrapper
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/dataframe.py", line 3694, in join
    return self._join_on_key(
           ^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 195, in wrapper
    raise e
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 180, in wrapper
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/dataframe.py", line 3756, in _join_on_key
    combined_df = left._perform_join_by_index(right, how=how)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 195, in wrapper
    raise e
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py", line 180, in wrapper
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/dataframe.py", line 3786, in _perform_join_by_index
    block, _ = self._block.join(
               ^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py", line 2584, in join
    return join_mono_indexed(
           ^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py", line 3066, in join_mono_indexed
    combined_expr, (get_column_left, get_column_right) = left_expr.relational_join(
                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/array_value.py", line 486, in relational_join
    if not bigframes.dtypes.can_compare(ltype, rtype):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/dtypes.py", line 362, in can_compare
    coerced_type = coerce_to_common(type1, type2)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmpfs/src/github/python-bigquery-dataframes/bigframes/dtypes.py", line 896, in coerce_to_common
    raise TypeError(f"Cannot coerce {etype1} and {etype2} to a common type.")
TypeError: Cannot coerce string and Int64 to a common type.
/[tmpfs/src/github/python-bigquery-dataframes/third_party/bigframes_vendored/pandas/core/frame.py:4675](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/third_party/bigframes_vendored/pandas/core/frame.py?l=4675): UnexpectedException

This seems unrelated to the current change.

feat: add GroupBy.__iter__

0a1ec39

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 13, 2025

tswast added 2 commits February 14, 2025 10:49

Merge remote-tracking branch 'origin/main' into b329865893-groupby-iter

96fb73b

iterate over keys

1975c8a

match by key

91e9ade

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Feb 14, 2025

tswast added 2 commits September 17, 2025 21:27

Merge remote-tracking branch 'origin/main' into b329865893-groupby-iter

cd38940

implement it

d68b56d

tswast marked this pull request as ready for review September 18, 2025 17:58

tswast requested review from a team as code owners September 18, 2025 17:58

tswast requested a review from GarrettWu September 18, 2025 17:58

Merge branch 'main' into b329865893-groupby-iter

fc11a1c

blunderbuss-gcf bot assigned shuoweil Sep 18, 2025

tswast commented Sep 18, 2025

View reviewed changes

refactor

d6ba77f

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Sep 18, 2025

revert notebook change

b4214cf

product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Sep 18, 2025

Merge branch 'main' into b329865893-groupby-iter

c13ea37

tswast assigned GarrettWu and unassigned shuoweil Sep 18, 2025

Merge branch 'main' into b329865893-groupby-iter

ac07946

shuoweil approved these changes Sep 19, 2025

View reviewed changes

tswast merged commit c56a78c into main Sep 19, 2025
21 of 25 checks passed

tswast deleted the b329865893-groupby-iter branch September 19, 2025 14:28

release-please bot mentioned this pull request Sep 18, 2025

chore(main): release 2.22.0 #2099

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `GroupBy.iter`#1394

feat: add `GroupBy.iter`#1394
tswast merged 11 commits intomainfrom
b329865893-groupby-iter

tswast commented Feb 13, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Feb 14, 2025

Uh oh!

tswast Sep 18, 2025

Uh oh!

tswast Sep 18, 2025

Uh oh!

tswast commented Sep 18, 2025

Uh oh!

tswast commented Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tswast commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Feb 14, 2025

Uh oh!

tswast Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

tswast commented Sep 18, 2025

Uh oh!

tswast commented Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tswast commented Feb 13, 2025 •

edited

Loading