Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
9021 commits
Select commit Hold shift + click to select a range
541d9bc
Bump xgrammar from 0.1.21 to 0.1.32 in /sdks/python/container/ml/py31…
dependabot[bot] Apr 29, 2026
c09cfb6
Update Beam website to release 2.73.0
Amar3tto Apr 16, 2026
4217116
Update release date
Amar3tto Apr 29, 2026
d492624
Update changes
Amar3tto Apr 29, 2026
083e579
update gemini review to not run on draft prs (#38333)
derrickaw Apr 29, 2026
4678606
[runners-spark] Prep shared base for Spark 4 (#38324)
tkaymak Apr 30, 2026
e45ab70
fix conversion failures - block was at the wrong nesting level (#38336)
reuvenlax Apr 30, 2026
d956eb1
Update managed-io.md for release 2.73.0-RC2. (#38276)
jrmccluskey Apr 30, 2026
e3249fc
Adding release-2.73.0-postrelease to protected branches in .asf.yaml
Apr 30, 2026
64a1510
Merge pull request #38222 from apache/website-2-73
Amar3tto Apr 30, 2026
03bd44e
Revert "fix preCommit Spotless rsync install (#38160)" (#38196)
Abacn Apr 30, 2026
2e5a3a0
Bump pytest from 8.4.2 to 9.0.3 in /sdks/python/container/ml/py311 (#…
dependabot[bot] Apr 30, 2026
0a50f31
Do not set representsation for pythonsdk_any type when typehint is An…
Abacn Apr 30, 2026
50f36b2
Bump pytest from 8.4.2 to 9.0.3 in /sdks/python/container/py311 (#38174)
dependabot[bot] Apr 30, 2026
4aedd4e
Ignore container changes from dependabot (#38344)
damccorm Apr 30, 2026
73985e5
Clean up remaining references to Samza runner (#38326)
Abacn Apr 30, 2026
fffa3c0
Remove Ubuntu 20.04 runner pools (#38335)
Abacn Apr 30, 2026
62fd95d
Merge pull request #38332: only expand update graph if needed
reuvenlax Apr 30, 2026
9a5318e
Cross-build java-extensions-avro with Avro 1.12 (#38109)
cjordn Apr 30, 2026
e530809
Revert "remove processContext usage across examples (java and kotlin)…
Abacn May 1, 2026
dfe97e6
Update republish_released_docker_containers.yml to 2.73.0 (#38354)
Amar3tto May 1, 2026
b0cc432
Mark dynamically generated ml_preprocessing test as no_xdist for prec…
aIbrahiim May 1, 2026
42cfe34
Update go version
Amar3tto May 1, 2026
ece9b45
Update go version base image
Amar3tto May 1, 2026
9fdb366
fix: correct typo "occured" to "occurred" (#38087)
harshadkhetpal May 1, 2026
a4cb676
Fix unhandled exception in KafkaIO SDF (#37449) (#37553)
junaiddshaukat May 1, 2026
8fc2b6b
Move Python PreCommit middle versions to PostCommit (#38347)
Abacn May 1, 2026
ca70eaa
Add DiskProvisionedIops/ThroughputMibps pipeline options for the Java…
bambadiouf1 May 1, 2026
2575102
Install wget
Amar3tto May 1, 2026
bcb0726
Install wget
Amar3tto May 1, 2026
7920213
Merge pull request #38356 from apache/fix-playground-go
Amar3tto May 1, 2026
c782e1c
Revert "Fix unhandled exception in KafkaIO SDF (#37449) (#37553)" (#3…
johnjcasey May 1, 2026
c5d0ab1
Bump Java bytecode compatibility version to Java11 (#38267)
Abacn May 1, 2026
5938b31
Run on ubuntu-24.04
Amar3tto May 2, 2026
0884c46
Optimze away WindmillWatermarkHold::clear when the cached hold is emp…
arunpandianp May 4, 2026
d8d12c7
[runners-spark] Use robust constructor resolution in EncoderFactory (…
tkaymak May 4, 2026
49abdcf
add reshuffle as a first class yaml transform and a test (#38046)
derrickaw May 4, 2026
802331b
Merge pull request #38365 from apache/python-arm
Amar3tto May 4, 2026
a14d009
Improve logging in boot.go to facilitate future triaging (#38342)
shunping May 4, 2026
4baf3ca
Upgrade github action versions (#38202)
derrickaw May 4, 2026
5c1980f
Fix SDF bundle finalization timeout in streaming test (#38287)
aIbrahiim May 4, 2026
637231c
Add option to use asyncio for AsyncWrapper (#38262)
AMOOOMA May 4, 2026
7d3dbca
Add TableRowMatchers with strict type-aware equality for BigQuery (#3…
lalitx17 May 4, 2026
a5496c6
[ValueKind] Add to model (#38308)
ahmedabu98 May 4, 2026
010c52f
Allow Beam Python GCP extra to resolve with google-cloud-storage 3.x …
officialasishkumar May 4, 2026
268ae1a
[IcebergIO] Support hash distribution mode when writing rows (#38061)
ahmedabu98 May 4, 2026
21b033c
Refactor metadata propagation in ReduceFnRunner to support extensible…
stankiewicz May 5, 2026
110e759
Add missing dependency on model:fn-execution to runners:core-java
stankiewicz Apr 21, 2026
fd571c9
document asserts due to new state added
stankiewicz Apr 24, 2026
79f7db9
Fix propagation of metadata
stankiewicz Apr 24, 2026
674873d
Merge pull request #36962 from stankiewicz/model
stankiewicz May 5, 2026
dc7b5b0
[Gemini] Migrate all remaining uses of typing types with built-in equ…
jrmccluskey May 5, 2026
aa5797f
Add pipeline hash (#38357)
tarun-google May 5, 2026
16609ed
fix error prone. (#38372)
stankiewicz May 5, 2026
efe4e94
extend to yaml (#38371)
ahmedabu98 May 5, 2026
d6b98ba
Bump minimatch in /scripts/ci/pr-bot (#37732)
dependabot[bot] May 6, 2026
da391d0
Merge pull request #38230 from stankiewicz/drain_combiner
stankiewicz May 6, 2026
455075b
Upgrade to Avro 1.12 (#38373)
Abacn May 6, 2026
7235088
GCS client library migration in Java SDK - part 3 (#37900)
shunping May 6, 2026
6fc8fde
Fix WindowedValue.of() invocation. (#38380)
stankiewicz May 6, 2026
e59af80
Fix FHIR search method signature mismatch after google.golang.org/api…
bambadiouf1 May 6, 2026
0bd11fd
Go VR Flink test on Flink 2.0 (#37640)
Abacn May 6, 2026
3a66bee
Bump github.com/apache/thrift from 0.21.0 to 0.23.0 in /sdks (#38383)
dependabot[bot] May 6, 2026
8e0aa0c
update excluded path to relative path (#38378)
derrickaw May 7, 2026
1db67c7
Bump cloud.google.com/go/bigquery from 1.74.0 to 1.77.0 in /sdks (#38…
dependabot[bot] May 7, 2026
400b114
Add Staged Artifact validations for RunnerV2 (#37974)
tarun-google May 7, 2026
fb27273
Bump hashicorp/setup-terraform from 3 to 4 (#38389)
dependabot[bot] May 7, 2026
259e255
Bump actions/checkout from 4 to 6 (#38393)
dependabot[bot] May 7, 2026
c0414c0
Bump crazy-max/ghaction-import-gpg from 6.3.0 to 7.0.0 (#38387)
dependabot[bot] May 7, 2026
b4a1794
Bump cloud.google.com/go/bigtable from 1.42.0 to 1.47.0 in /sdks (#38…
dependabot[bot] May 7, 2026
712ea7e
Bump docker/login-action from 4.0.0 to 4.1.0 (#38394)
dependabot[bot] May 7, 2026
e533dd1
Bump docker/build-push-action from 7.0.0 to 7.1.0 (#38391)
dependabot[bot] May 7, 2026
991a8d5
Bump google.golang.org/api from 0.276.0 to 0.278.0 in /sdks (#38395)
dependabot[bot] May 7, 2026
c971ec4
Merge pull request #38381 from bambadiouf1/fix-fhir-search-sig
Amar3tto May 7, 2026
034d7bf
Bump actions/github-script from 8 to 9 (#38402)
dependabot[bot] May 7, 2026
032cbe2
Bump actions/download-artifact from 7 to 8 (#38401)
dependabot[bot] May 7, 2026
5bc2d6d
Bump actions/upload-artifact from 4 to 7 (#38400)
dependabot[bot] May 7, 2026
0972151
Bump google.golang.org/grpc from 1.80.0 to 1.81.0 in /sdks (#38399)
dependabot[bot] May 7, 2026
0975d7b
Bump github.com/lib/pq from 1.11.1 to 1.12.3 in /sdks (#38398)
dependabot[bot] May 7, 2026
bc90080
Bump github.com/nats-io/nats-server/v2 from 2.12.6 to 2.14.0 in /sdks…
dependabot[bot] May 7, 2026
2190c9f
Bump cloud.google.com/go/datastore from 1.22.0 to 1.23.0 in /sdks (#3…
dependabot[bot] May 7, 2026
b78e11e
Stabilize StorageApiDataTriggeredSchemaUpdateIT assertion (#38339)
aIbrahiim May 7, 2026
e9a2b60
Consolidate linting and type checking configs into pyproject.toml (#3…
jrmccluskey May 7, 2026
b30af9e
BatchElements transform for Java SDK (#38369)
ganesh-skumar May 7, 2026
422f630
[yaml] - Add huggingface model handler (#38110)
derrickaw May 7, 2026
02ac93e
Add DiskProvisionedIops/ThroughputMibps pipeline options for the Pyth…
bambadiouf1 May 7, 2026
22c43bf
Mitigate test broken after Avro Upgrade due to AVRO-4110 (#38405)
Abacn May 7, 2026
b7b0707
Bump cloud.google.com/go/pubsub from 1.50.1 to 1.50.2 in /sdks (#38415)
dependabot[bot] May 8, 2026
c75b09d
Bump cloud.google.com/go/storage from 1.62.0 to 1.62.1 in /sdks (#38414)
dependabot[bot] May 8, 2026
d273ee8
Bump github.com/go-sql-driver/mysql from 1.9.3 to 1.10.0 in /sdks (#3…
dependabot[bot] May 8, 2026
878042d
Bump github.com/aws/aws-sdk-go-v2 from 1.41.5 to 1.41.7 in /sdks (#38…
dependabot[bot] May 8, 2026
30e98f7
yaml_transform_test - add some debug logs and increase row count (#38…
derrickaw May 8, 2026
e1b3769
Fix test (#38418)
Amar3tto May 8, 2026
a619a58
fix rrio teardown executor cleanup path (#38417)
aIbrahiim May 8, 2026
a9cd017
Fix SystemError in _DeferredCall.get() under GC pressure (#38355)
arr2036 May 8, 2026
955b80a
Revert "Revert "Bump com.pswidersk.terraform-plugin from 1.0.0 to 1.1…
Abacn May 8, 2026
b60082d
Revert "[yaml] - Add huggingface model handler (#38110)" (#38421)
derrickaw May 8, 2026
81828fd
[Dataflow] Added Portable Runner alias to java runners (#38411)
TongruiLi May 8, 2026
a8e7ffa
Fix race conditions, error recovery, and exit handlers in job servers…
shunping May 9, 2026
9a6734b
Fix PreCommit Java PVR Prism Loopback workflow (#38431)
shunping May 11, 2026
81830f2
Bump cloud.google.com/go/profiler from 0.4.3 to 0.6.0 in /sdks (#38436)
dependabot[bot] May 11, 2026
2e6ff74
Bump golang.org/x/sys from 0.43.0 to 0.44.0 in /sdks (#38435)
dependabot[bot] May 11, 2026
8363afe
Bump cloud.google.com/go/spanner from 1.88.0 to 1.91.0 in /sdks (#38434)
dependabot[bot] May 11, 2026
5733cc8
Fix flaky BigQuery file loads by safely handling concurrent mkdirs (#…
shunping May 11, 2026
a86f2ec
upgrade test containers and fix issues (#38438)
derrickaw May 11, 2026
399d9d7
Introduce ValueKind to Java and add to WindowedValue (#38315)
ahmedabu98 May 11, 2026
13bbd5c
Reduce number of layers for Beam container images (#38440)
Abacn May 11, 2026
f1bbb63
Fix deadlock in AsyncWrapper reset_state() (#38427)
shunping May 11, 2026
f01b9dd
Make Beartype use the default behavior in is_consistent_with() (#38275)
jrmccluskey May 11, 2026
1d008ba
[runners-spark] Add Spark 4 runner (#38255)
tkaymak May 11, 2026
a72b781
Fix thread leak for LOOPBACK workers in external worker pool (#38432)
shunping May 11, 2026
fba639a
Bump urllib3 from 2.6.3 to 2.7.0 in /sdks/python/container/py312 (#38…
dependabot[bot] May 12, 2026
81769cb
Adds a new coder translator for Java SchemaCoder. (#37631)
acrites May 12, 2026
118f2a3
Bump github.com/nats-io/nats.go from 1.51.0 to 1.52.0 in /sdks (#38463)
dependabot[bot] May 12, 2026
fb2b8e8
Bump actions/upload-artifact from 4 to 7 (#38460)
dependabot[bot] May 12, 2026
abf0c71
Bump github.com/aws/aws-sdk-go-v2/config from 1.32.7 to 1.32.17 in /s…
dependabot[bot] May 12, 2026
9bee04c
Bump golang.org/x/net from 0.53.0 to 0.54.0 in /sdks (#38462)
dependabot[bot] May 12, 2026
84a63ed
Bump actions/checkout from 4 to 6 (#38464)
dependabot[bot] May 12, 2026
83706e1
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38461)
dependabot[bot] May 12, 2026
34c6e26
Make SubprocessServer shared cache purging idempotent (#38455)
shunping May 12, 2026
4d683c0
Exercise Spark 4 tests (#38453)
Abacn May 12, 2026
fbe61ea
[Python] Python] Bound the memory used for fnapi outbound data messag…
scwhittle May 12, 2026
cf5e517
update ruff to pyproject (#38474)
derrickaw May 12, 2026
1f3d665
Expose StorageWriteApiMaxRequestCallbackWaitTimeSec in BQ storage wri…
claudevdm May 12, 2026
89dde7c
Upgrade gcsio to 3.1.16 (#38419)
Abacn May 12, 2026
fa0eef9
Add retry in connecting manager in MultiProcessShared (#38456)
shunping May 12, 2026
ec65d64
[Prism] Fix gRPC deadline exceeded errors during bundle failure by pa…
shunping May 12, 2026
e7cb9f7
[Website] add drain update to docs (#38450)
stankiewicz May 13, 2026
79ed7cc
Remove legacy processContext usage across and replace it with argumen…
stankiewicz May 13, 2026
e6fed0a
Bump google.golang.org/api from 0.278.0 to 0.279.0 in /sdks (#38480)
dependabot[bot] May 13, 2026
7ded0e0
Setup Validates runner tests for Spark4 runner (#38478)
Abacn May 13, 2026
bd52fce
Moving to 2.75.0-SNAPSHOT on master branch.
May 13, 2026
13d36eb
update port for jdbc test (#38485)
derrickaw May 13, 2026
632f1e4
[runners-spark] Spark 4 follow-up: Announce in CHANGES.md and address…
tkaymak May 13, 2026
3dbd7c8
Sickbay two failed tests due to new schema coder urn. (#38497)
shunping May 14, 2026
f74b45a
Bump Java dev image for Dataflow (#38496)
Abacn May 14, 2026
10304c9
Fix gradle command
Amar3tto May 14, 2026
442d6ff
Merge pull request #38499 from apache/fix-spark4-batch
Amar3tto May 14, 2026
39632ef
Fix iceberg gcs dependency after gcsio 3.0 upgrade (#38488)
Abacn May 14, 2026
1d6cdc0
Isolate tests in multi_process_shared with unique temp path. (#38498)
shunping May 14, 2026
5b5cd65
Bump @protobufjs/utf8 from 1.1.0 to 1.1.1 in /sdks/typescript (#38475)
dependabot[bot] May 14, 2026
9a1a4b1
[Iceberg] pin hadoop to 3.3.6 (#38500)
ahmedabu98 May 14, 2026
7b4ae89
Set torch upper bound
shunping May 15, 2026
45790bb
Merge pull request #38505 from shunping/set-torch-upper-bound
Amar3tto May 15, 2026
84a2168
Bump google.golang.org/grpc from 1.81.0 to 1.81.1 in /sdks (#38506)
dependabot[bot] May 15, 2026
5d2ff20
Enable multi-release attribute for Spark job server jar (#38449)
Abacn May 15, 2026
6e23bc1
Fix empty stream name encountered in StorageApiFinalizeWrtiesDoFn (#3…
Abacn May 15, 2026
6ae46e3
Bump jackson_version - Fix GHSA-72hv-8253-57qq (#37969)
stankiewicz May 15, 2026
4ac7598
Fix Dataflow archetype missing API client dependency (#38509)
aIbrahiim May 15, 2026
b2d0ef6
Fix pom involving maven-archetype and opentelemetry (#38518)
Abacn May 15, 2026
357fd26
Revert #37631 and #38497 on HEAD (#38516)
Abacn May 15, 2026
0496ea7
Fix typo in README website description (#38469)
arpitjain099 May 15, 2026
4c4a2c1
Revert "Fix Dataflow archetype missing API client dependency (#38509)…
Abacn May 15, 2026
1ca1faf
Fix SubprocessServer cache thread-safety and test isolation (#38501)
shunping May 17, 2026
233ebd8
[Dataflow Streaming] Add a job setting to limit value size in windmil…
arunpandianp May 18, 2026
2ea4697
Fix non-unique job names in TextIOReadTest (#38242)
Subramanya-Veeregowda May 18, 2026
2c4d2c6
Rename run_pylint.sh (#38307)
jrmccluskey May 18, 2026
26c6ec7
add todo (#38517)
ahmedabu98 May 18, 2026
55eb624
Fix interactive environment clean up failure at atexit. (#38526)
shunping May 18, 2026
79ae776
Fix PipelineOptions deserialization NPE (#38531)
shunping May 19, 2026
0318a2f
add yaml agent development skill (#38382)
derrickaw May 19, 2026
01b0972
Bump protobufjs in /sdks/typescript (#38538)
dependabot[bot] May 19, 2026
c3335c8
Make sure session creation happens before starting agent (#38477)
damccorm May 19, 2026
91f50ce
Remove Java 8 variant of beam_PostCommit_Java_ValidatesRunner_Dataflo…
Abacn May 19, 2026
e83a1d5
Pin tensor_rt digest for PyTorch sentiment Dataflow benchmarks (#38374)
aIbrahiim May 19, 2026
b1f7115
Update Dataflow Dependency Java (#38542)
tarun-google May 20, 2026
6b55c14
Update Dataflow Proto (#38540)
tarun-google May 20, 2026
bb0bc77
Bump google.golang.org/api from 0.279.0 to 0.280.0 in /sdks (#38563)
dependabot[bot] May 20, 2026
ef703f6
Bump cloud.google.com/go/datastore from 1.23.0 to 1.24.0 in /sdks (#3…
dependabot[bot] May 20, 2026
f22400a
[Dataflow Streaming] Add experiment to use trigger state to know if a…
arunpandianp May 20, 2026
2105eaa
Disable grpc fork support on some test suites. (#38566)
shunping May 20, 2026
c33bc97
Fix race condition in UserPipelineTracker.clear() and various problem…
shunping May 20, 2026
582bb45
[Java SDK] Infer Beam logical types for JSR-310 and UUID fields (#38194)
sachinnn99 May 20, 2026
ba63482
introduce private method to remove clones (#38245)
aaaZayne May 20, 2026
2375fed
Refresh Iceberg partition specs periodically (#38408)
AtharvUrunkar May 20, 2026
290e372
WriteToJson - force num_shards (#38484)
derrickaw May 20, 2026
4f2411e
Revert "[Java Portable SDK] Configure JVM so that it exits upon OutOf…
scwhittle May 21, 2026
32ec9bb
Bump com.gradle.common-custom-user-data-gradle-plugin (#38574)
dependabot[bot] May 21, 2026
b9b9e78
Bump github.com/nats-io/nats-server/v2 from 2.14.0 to 2.14.1 in /sdks…
dependabot[bot] May 21, 2026
930b94c
Fix test hang in subprocess expansion service on port bind failure (#…
shunping May 21, 2026
ad93dd5
Install wget and use repo token (#38340)
Amar3tto May 21, 2026
65e8b65
huggingface model handler for yaml - retry (#38451)
derrickaw May 21, 2026
707b48d
Fix table row inference benchmark using wrong model path (#38569)
aIbrahiim May 21, 2026
898c1a8
Fix flaky MatrixPowerTest.test_basics by reading all generated shards…
shunping May 21, 2026
0c9b272
Update transform catalogue docs (#38457)
ganesh-skumar May 21, 2026
b6aaf42
Initial skeleton for the Delta Lake source (#38571)
chamikaramj May 21, 2026
60620e7
Enforce Google Maven Mirror on CI environments (#38586)
shunping May 21, 2026
1c0c024
Adds backlog reporting support for non-fnapi based SDF's. (#38346)
acrites May 21, 2026
2203c8d
[Prism] Fix DEADLINE_EXCEEDED errors caused by worker failures (#38523)
shunping May 21, 2026
9d307e5
Suppress log spams in gcsio 3.0 (#38588)
Abacn May 22, 2026
8f6e271
Bump golang.org/x/net from 0.54.0 to 0.55.0 in /sdks (#38595)
dependabot[bot] May 22, 2026
70d8f7d
Revert "huggingface model handler for yaml - retry (#38451)" (#38601)
derrickaw May 22, 2026
63501f3
Fix data_race condition in WindmillStreamSenderTest (#38589)
parveensania May 22, 2026
ef5f89f
[Iceberg] Fix manifest bounds being padded with trailing 0x00 bytes (…
dejii May 22, 2026
489efaa
Bump docker/build-push-action from 7.1.0 to 7.2.0 (#38597)
dependabot[bot] May 22, 2026
889ac54
Fix cancelled run handling in flaky test prefetcher (#38579)
aIbrahiim May 22, 2026
a3e33da
increase timeout (#38604)
derrickaw May 22, 2026
978c375
Revert "Bump golangci/golangci-lint-action from 3 to 8 (#36291)" (#38…
Abacn May 22, 2026
590c9c2
Report source lineage from HadoopFormatIO (#37265)
shnapz May 22, 2026
679adb8
Pipe Valuekind through DoFn and output builders (#38490)
ahmedabu98 May 22, 2026
7272ab5
[Python] Add type_overrides parameter to BigQuery I/O for custom BigQ…
enzomaruffa May 22, 2026
d901481
Update Python Dependencies - manual trigger (#38610)
github-actions[bot] May 23, 2026
db15fff
Count cancelled runs in last-5 flaky check (#38614)
aIbrahiim May 23, 2026
361c2c4
Reduce python expansion service startup time (#38611)
shunping May 24, 2026
bb5c486
Update action.yml
stankiewicz May 25, 2026
94a9f82
Bump docker/login-action from 4.1.0 to 4.2.0 (#38627)
dependabot[bot] May 25, 2026
8a30cc7
Bump github.com/aws/aws-sdk-go-v2/config in /sdks (#38629)
dependabot[bot] May 25, 2026
35c70e5
Bump docker/setup-buildx-action from 4.0.0 to 4.1.0 (#38628)
dependabot[bot] May 25, 2026
181424b
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38630)
dependabot[bot] May 25, 2026
551a5a0
Update cython requirement from <4,>=3.0 to >=3.2.5,<4 in /sdks/python…
dependabot[bot] May 26, 2026
5ad6df9
Fix flaky test_csv_to_json in Beam YAML SDK (#38617)
shunping May 26, 2026
1858d2e
Update go dependencies (#38530)
joesantos418 May 26, 2026
1b261f0
Fix Fn API data plane deadlock when outbound queue is full (#38581)
aIbrahiim May 26, 2026
4049f00
[yaml] add support for matchall (#38512)
derrickaw May 26, 2026
eb29f86
Update BEAM_DEV_SDK_CONTAINER_TAG to new version (#38699)
damccorm May 26, 2026
d553b78
[Dataflow Streaming] Add Operation::finishKey() and move processTimer…
arunpandianp May 27, 2026
9583fe9
[Cloud Spanner Change Streams] Fix inverted evaluation of cancelQuery…
scwhittle May 27, 2026
b91538b
Bump cloud.google.com/go/bigtable from 1.47.0 to 1.48.0 in /sdks (#38…
dependabot[bot] May 27, 2026
a7d2708
Bump google.golang.org/api from 0.280.0 to 0.281.0 in /sdks (#38704)
dependabot[bot] May 27, 2026
45b79d2
Remove redundant HTTP->HTTPS redirect in .htaccess (#38522)
liferoad May 27, 2026
852d072
Refactor _SharedCache to handle context vs non-context ownership (#38…
shunping May 27, 2026
d2a3cee
Enforce binary-only constraints during early requirements cache attem…
shunping May 28, 2026
5096e63
Bump docker/setup-qemu-action from 4.0.0 to 4.1.0 (#38718)
dependabot[bot] May 28, 2026
3cb68c9
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38719)
dependabot[bot] May 28, 2026
758042f
Bump google.golang.org/api from 0.281.0 to 0.282.0 in /sdks (#38720)
dependabot[bot] May 28, 2026
789b278
Bump github.com/aws/smithy-go from 1.25.1 to 1.26.0 in /sdks (#38721)
dependabot[bot] May 28, 2026
54f7c7d
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38722)
dependabot[bot] May 28, 2026
0302bf5
Implemented MLTransform generate vocab Dataflow benchmark (#38215)
aIbrahiim May 28, 2026
cd8b009
[#38708] Add Flink 2.0 to SDK pipeline options validation lists (#38726)
durgaprasadml May 28, 2026
a049489
up timeout (#38732)
derrickaw May 28, 2026
3f4bcb0
remove playground annotations from BatchElementsExample until next re…
aIbrahiim May 28, 2026
bdef23f
Update CHANGES.md after release cut (#38608)
Abacn May 28, 2026
3e5a085
Support infer types involving dataclass fields (#38548)
Abacn May 28, 2026
a32774a
[Prism] Fix race condition in element manager (#38734)
shunping May 28, 2026
cbab4d7
Add staged package hashes (#38311)
tarun-google May 28, 2026
d8d4cfc
Updated All Externally Visible Logs/Docs from Runner V2 to Portable R…
TongruiLi May 29, 2026
3d68e9d
Bump github.com/aws/aws-sdk-go-v2 from 1.41.7 to 1.41.8 in /sdks (#38…
dependabot[bot] May 29, 2026
dc3b2cc
Bump github.com/aws/aws-sdk-go-v2/credentials in /sdks (#38737)
dependabot[bot] May 29, 2026
09a1e60
fix race condition in sdb-operator (#38698)
aIbrahiim May 29, 2026
5702872
Disable build isolation for pip by default. (#38700)
shunping May 29, 2026
07222b4
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38741)
dependabot[bot] May 29, 2026
0a7df5c
Bump github.com/aws/aws-sdk-go-v2/config in /sdks (#38738)
dependabot[bot] May 29, 2026
d9f07c5
Bump setup gradle to allowlisted v6.1.0 (#38745)
aIbrahiim May 29, 2026
5a5ef0c
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38739)
dependabot[bot] May 29, 2026
6fba3cb
Implement MLTransform One-Hot Encoding benchmark pipeline (#38404)
aIbrahiim May 29, 2026
2b5c113
[yaml] - mongodb write normalization (#38376)
derrickaw May 29, 2026
f771827
Disable gRPC fork support in PortableRunnerTestWithSubprocesses (#38744)
shunping May 29, 2026
cbc2c21
Fix flaky TestDataSampler/GetSamplesForPCollectionsTooManySamples (#3…
shunping May 29, 2026
44a2ff6
Re-enable call-args check in yaml_io.py (#38730)
jrmccluskey May 29, 2026
c31f232
Fix Reshuffle handling in Prism to recursively remove nested sub-tran…
shunping May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
62 changes: 62 additions & 0 deletions .agent/skills/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Apache Beam Skills

This directory contains skills that help the agent perform specialized tasks in the Apache Beam codebase. For more information, see the [Agent Skills Documentation](http://antigravity.google/docs/skills).

## Available Skills

| Skill | Description |
|-------|-------------|
| [beam-concepts](beam-concepts/SKILL.md) | Core Beam programming model (PCollections, PTransforms, windowing, triggers) |
| [ci-cd](ci-cd/SKILL.md) | GitHub Actions workflows, debugging CI failures, triggering tests |
| [contributing](contributing/SKILL.md) | PR workflow, issue management, code review, release cycles |
| [gradle-build](gradle-build/SKILL.md) | Build commands, flags, publishing, troubleshooting |
| [io-connectors](io-connectors/SKILL.md) | 51+ I/O connectors, testing patterns, usage examples |
| [java-development](java-development/SKILL.md) | Java SDK development, building, testing, project structure |
| [license-compliance](license-compliance/SKILL.md) | Apache 2.0 license headers for all new files |
| [python-development](python-development/SKILL.md) | Python SDK environment setup, testing, building pipelines |
| [runners](runners/SKILL.md) | Direct, Dataflow, Flink, Spark runner configuration |

## How Skills Work

1. **Discovery**: The agent scans skill descriptions to find relevant ones
2. **Activation**: When a skill matches the task, the agent reads the full `SKILL.md`
3. **Execution**: The agent follows the skill's instructions

## Skill Structure

Each skill folder contains:
- `SKILL.md` - Main instruction file with YAML frontmatter

```yaml
---
name: skill-name
description: Concise description for when to use this skill
---
# Skill Content
Detailed instructions...
```

## Adding New Skills

1. Create a new folder under `.agent/skills/`
2. Add a `SKILL.md` with YAML frontmatter (`name`, `description`)
3. Write clear, actionable instructions in the markdown body
118 changes: 118 additions & 0 deletions .agent/skills/adding-new-metadata/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: adding-new-metadata
description: Guide on how to add and propagate new metadata fields in Apache Beam's WindowedValue, extending protos, windmill persistence, and runner interfaces to avoid metadata loss.
---

# Adding New Metadata to WindowedValue

This skill provides a comprehensive guide on adding new metadata (e.g., CDC metadata, drain mode flags, OpenTelemetry trace context) to Apache Beam's `WindowedValue` and ensuring it propagates correctly through the execution engine. Failing to propagate metadata in all necessary places will result in metadata loss during pipeline execution.

## 1. Extending the Proto Model

When adding new metadata that must cross worker boundaries or be serialized by the Fn API, the proto definitions must be updated.

* **Key Files:** `model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto`
* **Action:** Add the new metadata field to the appropriate message (`ElementMetadata`).
* **Note:** Add proper documentation in proto. Type of the field can be different from the type in WindowedValue, see OpenTelemetry trace context for example.

## 2. WindowedValue Interface and Implementations

The `WindowedValue` is the core container for elements flowing through a Beam pipeline. It holds the value, timestamp, windows, pane info, and any additional metadata.

### Core Interface Updates
* **Key File:** `sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowedValue.java`
* **Action:** Add getter methods for your new metadata.

### Concrete Implementations
You must update **all** concrete implementations of `WindowedValue` to store and return the new metadata. If you miss one, metadata will be silently dropped.
* `ValueInGlobalWindow`
* `ValueInSingleWindow`
* `ValueInEmptyWindows` (often used inside runners, like Dataflow's worker package)
* **Action:** Update constructors, factory methods (`of()`), fields in these classes and coders.

### OutputBuilder vs. Context Output
* **IMPORTANT:** Do **not** add new arguments to legacy methods like `context.outputWindowedValue(...)` or `WindowedValue.of(value, timestamp, windows, pane)`. This causes brittleness and breaks the API for every new metadata field.
* **Action:** Modify `OutputBuilder` (`sdks/java/core/src/main/java/org/apache/beam/sdk/values/OutputBuilder.java`) to accept the new metadata (e.g., `.withDrainMode(...)`, `.withTraceContext(...)`). Use the builder pattern when constructing outputs to propagate offset and record IDs smoothly.

## 3. Windmill Persistence (Dataflow Streaming Engine) Runner v1

For the Dataflow streaming runner, metadata must survive serialization to and from the Windmill backend.

* **Serialization (Sink):**
* **File:** `runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillSink.java`
* **Action:** Extract the metadata from the `WindowedValue`, and add it to already created ElementMetadata proto builder.
* **Deserialization (Reader):**
* **Files:** `runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/UngroupedWindmillReader.java` and `WindowingWindmillReader.java`
* **Action:** Extract the metadata from ElementMetadata proto and reconstruct the `WindowedValue` using the updated factory methods/builders that include the metadata. This is incremental work, as plenty of metadata is already extracted from the proto.

## 4. Propagation Across Core Classes

Metadata must be explicitly copied or forwarded whenever a `WindowedValue` is transformed, buffered, or processed.

### DoFn Runners (Java Core)
You must ensure that when a DoFn processes an element and outputs a new element, the appropriate metadata from the *input* is propagated to the *output* (unless explicitly changed by the logic).
* `runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java`
* `runners/core-java/src/main/java/org/apache/beam/runners/core/StatefulDoFnRunner.java`
* `runners/core-java/src/main/java/org/apache/beam/runners/core/LateDataDroppingDoFnRunner.java`
* `runners/core-java/src/main/java/org/apache/beam/runners/core/ProcessFnRunner.java`
* `sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java`

**Action:** When these runners call `outputWindowedValue()`, they should extract the metadata from the input or current context and attach it using the `OutputBuilder` or the new `WindowedValue` interfaces.

### Grouping and Reducing
* `runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java`
* `runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnContextFactory.java`
* **Action:** Ensure that during GroupByKey/Combine operations, if metadata needs to be preserved (e.g., `CausedByDrain`), it is correctly passed into the `ReduceFnContextFactory` and propagated when outputting the grouped results.

### Splittable DoFns (SDF)
* `runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java`
* `sdks/java/core/src/main/java/org/apache/beam/sdk/util/construction/SplittableParDoNaiveBounded.java`

### Timers
If metadata needs to survive timer firings (e.g., knowing an `@OnTimer` fired because of a system drain), it must be added to Timer data structures. This is a bit of uncharted area which was only implemented for CausedByDrain metadata that comes from backend, not from persisted metadata. In order to persist all WindowedValue metadata across timer, more work has to be done, below are some pointers:
* `runners/core-java/src/main/java/org/apache/beam/runners/core/TimerInternals.java` and implementations (e.g., `WindmillTimerInternals.java` in Dataflow).
* **Action:** Add the field to `TimerData`, next to `CausedByDrain`. Propagate it when setting the timer and expose it when the timer fires so it bubbles up.
* Eventually, metadata from Timer lands in WindowedValue, so it can be exposed to users. Keep field names, types, and getters similar to WindowedValue as much as possible, as common interface may be introduced eventually.

## 5. Exposing Metadata to the User (DoFn Signatures)

User needs to access the metadata in their `DoFn` (e.g., `@ProcessElement public void process(ProcessContext c, CausedByDrain drain) { ... }`), you must update the reflection and bytecode generation logic.

* **Files:**
* `sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java`
* `sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignature.java`
* `sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnInvoker.java`
* `sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java`
* **Action:** Add logic to detect the new parameter type in the DoFn method signature. Generate bytecode using ByteBuddy to extract the property from the `WindowedValue` or context and pass it as an argument during method invocation.

## Checklist for Adding New Metadata

1. [ ] Define the metadata in `beam_fn_api.proto` (if applicable).
2. [ ] Add getters to the `WindowedValue` interface.
3. [ ] Update `ValueInGlobalWindow`, `ValueInSingleWindow`, `ValueInEmptyWindows` to store the metadata.
4. [ ] Update `OutputBuilder` to accept the metadata.
5. [ ] Update `WindmillSink` to serialize the metadata to the backend.
6. [ ] Update `UngroupedWindmillReader` and `WindowingWindmillReader` to deserialize the metadata.
7. [ ] Update `WindmillKeyedWorkItem`.
8. [ ] Update `SimpleDoFnRunner`, `StatefulDoFnRunner`, and `FnApiDoFnRunner` to propagate the metadata from input to output.
9. [ ] Update `ReduceFnRunner` and `OutputAndTimeBoundedSplittableProcessElementInvoker` for complex transform propagation.
10. [ ] If required by timers, update `TimerData` and `TimerInternals`.
11. [ ] If exposed to the user, update `DoFnSignatures` and `ByteBuddyDoFnInvokerFactory`.
12. [ ] Update other runners (Flink, Spark) to ensure they propagate the new `WindowedValue` fields correctly in their specific operators/runners.
Loading