Skip to content

Conversation

@CodingWithTim
Copy link
Collaborator

@CodingWithTim CodingWithTim commented Nov 8, 2024

Why are these changes needed?

Related issue number (if applicable)

Checks

  • I've run format.sh to lint the changes in this PR.
  • I've included any doc changes needed.
  • I've made sure the relevant tests are passing (if applicable).

@CodingWithTim
Copy link
Collaborator Author

> python clean_chat_data.py --action-type upvote

Will grab all the upvotes.

Copy link
Member

@infwinston infwinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CodingWithTim left some comments

@CodingWithTim
Copy link
Collaborator Author

@infwinston Suggestions integrated.

Copy link
Member

@infwinston infwinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a critical thing should be fixed. chunk_size shouldn't be 1 otherwise we will introduce lots of overhead and it might be even slower than sequential implementation.

Comment on lines 158 to 167
# Aggregate results from child processes
ct_invalid_conv_id = sum(
[data["ct_invalid_conv_id"] for data in results if "ct_invalid_conv_id" in data]
)
ct_invalid = sum([data["ct_invalid"] for data in results if "ct_invalid" in data])
ct_network_error = sum(
[data["ct_network_error"] for data in results if "ct_network_error" in data]
)
all_models = set([data["model"] for data in results if "model" in data])
chats = [data["result"] for data in results if "result" in data]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge into one for loop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants