fix: ml.model_selection.train_test_split index to match in unordered mode#2283
fix: ml.model_selection.train_test_split index to match in unordered mode#2283
Conversation
bigframes/ml/model_selection.py
Outdated
| results.append(joined_df_train[columns]) | ||
| results.append(joined_df_test[columns]) | ||
| results.append(joined_df_train[columns].cache()) | ||
| results.append(joined_df_test[columns].cache()) |
There was a problem hiding this comment.
This is a lot of .cache() calls. I think where the caching ideally happens is actually inside the block.split method. This way, the ordering is locked in, but only a single table is cached total, which should be a lot faster.
There was a problem hiding this comment.
I can put the caches outside of the loop, which removes some queries. But otherwise (caching anywhere in block.split) it doesn't work. Do you have an insight why is it?
There was a problem hiding this comment.
hmm, really? would expect caching anywhere around this area:
python-bigquery-dataframes/bigframes/core/blocks.py
Lines 901 to 913 in b487cf1
There was a problem hiding this comment.
No, no matter where, within the block.split, it won't work. Only do a cache() to the end results would help.
screen/6A2RFRNf9m96Qvo
Would it be a bug in some deeper code?
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes b/462105877