-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Finish including Beam version details in MongoDB handshake #36961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @NoahStapp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request finalizes the integration of MongoDB's wrapping client library specification within Apache Beam's MongoDB I/O connector. The core purpose is to ensure that Beam's version information is consistently transmitted during the MongoDB connection handshake, which significantly enhances server-side logging capabilities for better operational insights. Additionally, the changes include refactoring the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
| if self._range_is_not_splittable(start_pos, end_pos): | ||
| return [] | ||
|
|
||
| with MongoClient(self.uri, **self.spec) as client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this changed timing of client, made if from transient and auto-closing to persistent.
I would prefer a fix not changing current behavior. Consider the following
- Revert Include Beam version details in MongoDB handshake #36949
- In
__init__, update self.spec such that, if it does not contain a key named "driver", setself.spec["driver"] = DriverInfo("Apache Beam", beam.__version__)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally understand not wanting to make an implementation change on client management, but I would offer a strong motivation to do so: performance. Creating a new client for every operation has a much larger overhead than using a single persistent client across all operations. Can you share the current motivation for transient per-operation MongoClients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance could be another topic for separate changes.
also (**self.spec, driver=...) call will crash if self.spec has "driver" key (TypeError: ... get multiple valued for keyword argument driver)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I'll update this PR to revert the earlier change and apply your self.spec suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Shall we also port this change to Sink?
beam/sdks/python/apache_beam/io/mongodbio.py
Line 779 in 7564e9e
| def __init__(self, uri=None, db=None, coll=None, extra_params=None): |
1169536 to
7564e9e
Compare
|
Assigning reviewers: R: @jrmccluskey for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
(Follow-up reversion for 36949)
This PR finishes incorporating MongoDB's wrapping client library specification for the connection handshake to allow library details to be included in the metadata written to mongos or mongod logs.
For example, this change would allow server-side logs such as the following:
For anyone hosting clusters with connections coming from different applications this can help differentiate connections and facilitate log analysis.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.