-
Notifications
You must be signed in to change notification settings - Fork 86
support revision/branches and transformation can return None to skip … #769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
| item = self.transform(item) | ||
|
|
||
| return item | ||
| return item if item else self.__next__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense to have this in __next__ method, and not in __getitem__.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion
|
Hi @mpeex, can you share more details on when If this is your specific use case, one option can be to override the class MyDS(StreamingDataset):
def __next__(self, index):
item = super().__next__(self)
return item if item else self.__next__()
ds = MyDs(...)Also, can you write some tests verifying the behaviour for |
|
hi @deependujha thanks for reviewing. The usecase is to filter samples while streaming using the transform function, for ex: returning None in any of the I will write some tests to verify the behaviour for HF dataset revision/branches. thanks for your time |
|
I think that skipping samples, whether that's via |
|
considering @philgzl comment, what is than the correct way to filter out samples if using |
@mpeex — since this appears to be specific to your filtering use case, one option is to use the approach suggested by @deependujha and override the
To fully avoid these caveats, I’d recommend filtering during the |
|
@mpeex, would you be open to opening a separate PR that only contains the HuggingFace revision/branch support? Let me know if that works for you, and thanks again for working through this 👍 |
|
Having a separate PR for HF revision/branches makes more sense. |
…sample
Before submitting
What does this PR do?
Fixes # (issue).
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun? oh ye
Make sure you had fun coding 🙃