update transform catalogue docs#38457
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the documentation for the Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new BatchElements example and documentation to the Java SDK, along with a change to the core transform's output mechanism. The review identifies a potential compilation error in the core SDK where outputWindowedValue is called on an incompatible interface. Other feedback includes recommendations to use modern Beam practices like parameter injection and improved pipeline option initialization in the examples, as well as a minor typo fix in the documentation.
| receiver.outputWindowedValue( | ||
| targetBatch.elements, | ||
| targetWindow.maxTimestamp(), | ||
| singleton(targetWindow), | ||
| PaneInfo.NO_FIRING); |
There was a problem hiding this comment.
The OutputReceiver interface in the Apache Beam Java SDK does not currently have an outputWindowedValue method. This method is available on DoFn.ProcessContext, but using ProcessContext is generally deprecated in favor of parameter injection. If the intention is to explicitly set the window and pane info (which is necessary if the batch is being emitted for a window other than the one currently in context), you may need to use ProcessContext for this specific DoFn, or ensure that the OutputReceiver interface is updated to support this method. As it stands, this code will likely fail to compile.
There was a problem hiding this comment.
|
|
||
| public class BatchElementsExample { | ||
| public static void main(String[] args) { | ||
| PipelineOptions options = PipelineOptionsFactory.create(); |
There was a problem hiding this comment.
It is recommended to use PipelineOptionsFactory.fromArgs(args).withValidation().create() instead of PipelineOptionsFactory.create(). This allows users to pass command-line arguments, such as --runner, to configure the pipeline execution when running the example.
| PipelineOptions options = PipelineOptionsFactory.create(); | |
| PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create(); |
| public void processElement(ProcessContext c) throws Exception { | ||
| List<String> batch = c.element(); | ||
|
|
||
| LOG.info("Batch Contents: {}", batch); | ||
|
|
||
| for (String element : batch) { | ||
| c.output(element); | ||
| } | ||
| } |
There was a problem hiding this comment.
In modern Beam, it is preferred to use parameter injection (e.g., @Element and OutputReceiver) instead of the ProcessContext object. Additionally, the throws Exception clause is unnecessary here as no checked exceptions are thrown in the method body.
public void processElement(@Element List<String> batch, OutputReceiver<String> receiver) {
LOG.info("Batch Contents: {}", batch);
for (String element : batch) {
receiver.output(element);
}
}…egation/batchelements.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #38457 +/- ##
============================================
+ Coverage 55.76% 57.07% +1.31%
- Complexity 2095 3636 +1541
============================================
Files 1099 1185 +86
Lines 172277 189597 +17320
Branches 1350 3751 +2401
============================================
+ Hits 96065 108208 +12143
- Misses 73817 77924 +4107
- Partials 2395 3465 +1070
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Assigning reviewers: R: @ahmedabu98 for label java. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
ahmedabu98
left a comment
There was a problem hiding this comment.
Overall LGTM, just left one suggestion
| # BatchElements | ||
|
|
||
| BatchElements transform groups individual elements into batches before processing them downstream. | ||
| The transform takes a `PCollection<T>` as input and produces a `PCollection<List<T>>`, where each output element is a batch containing multiple input elements. | ||
| Batch sizes are chosen dynamically between the configured minimum and maximum values by measuring the execution time of downstream operations. | ||
|
|
||
| Batching is performed per window. Each emitted batch belongs to the same window as its input elements. |
There was a problem hiding this comment.
Maybe include some motivation behind BatchElements? This is a good excerpt:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.