Skip to content

feat: cherry-pick upstream provider improvements#4

Open
BorisTyshkevich wants to merge 36 commits intoaltinityfrom
feature/cherry-pick-upstream-improvements
Open

feat: cherry-pick upstream provider improvements#4
BorisTyshkevich wants to merge 36 commits intoaltinityfrom
feature/cherry-pick-upstream-improvements

Conversation

@BorisTyshkevich
Copy link
Collaborator

Summary

Cherry-picks 18 high-priority commits from transferia/transferia:main related to active providers (ClickHouse, PostgreSQL, MySQL, Kafka) plus infrastructure improvements.

Bug Fixes

  • MySQL: Fix IPv6 in MySQL connector
  • ClickHouse: Fix clickhouse toasts logging
  • PostgreSQL: Fill columnValues for toast updates

Feature Improvements

  • ClickHouse: Async CH sink tmp table error retries + 5min timeout
  • ClickHouse: Use nil for unknown required columns (insert_null_as_default)
  • MySQL: Fix datetime timezone snapshot vs replication
  • ClickHouse: Alter column types on CH schema migration
  • ClickHouse: Remove CH destination AddNewColumns (use IsSchemaMigrationDisabled)

PostgreSQL Improvements

  • PostgreSQL: Inline bulk INSERT for PG destination (major perf improvement)
  • PostgreSQL: Save pg_dump items order
  • PostgreSQL: Fix pg_dump tables filter

Kafka Improvements

  • Kafka: Kafka one partition source (reader refactor)
  • Kafka: Improve per partition kafka source

Infrastructure & Observability

  • ClickHouse: Add CH async streamer logging
  • Debezium: Sanitize debezium parameters
  • MySQL: Remove debug log line

Infrastructure Refactors

  • Serializer: Serializers refactor and optimize batch (buffer pooling, parallel serialize+write)
  • Abstract: Separate sampleable and checksumable interfaces (interface segregation)

Test plan

  • Build verification (go build ./...)
  • providers wave: PASS
  • storage-canon wave: PASS
  • e2e-core wave: PASS
  • optional-queues wave: PASS (Kafka validation)

🤖 Generated with Claude Code

bvt123 and others added 30 commits February 26, 2026 12:50
(cherry picked from commit 8c1c78cd60d2d6c50a8c04d523260eaf30a15161)
…e execution semantics

This commit consolidates the CH-only cleanup stream into a coherent test-system migration and stability pass. It normalizes suite topology around the new layered model, restores matrix tooling, and hardens local test execution so failures reflect real regressions instead of runner/logging artifacts.

What changed
- Rebuilt test layout around authoritative layers:
  - core: tests/e2e-core/{pg2ch,mysql2ch,mongo2ch}
  - optional: tests/e2e-optional/{kafka2ch,eventhub2ch,kinesis2ch,airbyte2ch,oracle2ch,ch2ch}
  - supporting layers: tests/evolution, tests/resume, tests/large
- Restored and aligned matrix/test orchestration assets:
  - tests/e2e-core/matrix/{cdc_local_suite.yaml,cdc_optional_suite.yaml,core2ch.yaml,sources.yaml,README.md}
  - Makefile targets for wave-based execution, optional gates, cache controls, and strict rerun behavior.
- Reintroduced/normalized suite content under e2e-core/e2e-optional and supporting layers (including ch2ch + stdout/dev scope constraints already discussed in branch direction).
- Refreshed helper surface used by the new system:
  - tests/helpers/coordinator_backend.go and related helper updates
  - compare storage and metering helper additions
- Canon and runner stability fixes:
  - removed noisy raw JSON stdout from canon validator sink close path
  - adjusted test worker logger usage to avoid forced debug churn in execution paths
  - updated logger behavior to respect explicit LOG_LEVEL in gotest context
  - Makefile now supports GOTESTSUM_FORMAT with sensible local/CI behavior
  - test targets export LOG_LEVEL/ YT_LOG_LEVEL for deterministic output
- Updated docs to reflect the new test model and commands:
  - tests/README.md
  - layer-specific README files across core/optional/evolution/resume/large

Why
- The previous state mixed legacy and new orchestration, causing duplicated intent, brittle execution order, and hard-to-diagnose failures.
- Canon and gotestsum output parsing produced synthetic failures in noisy runs despite successful package exits.
- The new structure makes retained provider scope explicit, keeps optional lanes separate, and improves maintainability and CI signal quality.

Validation performed
- make test-cdc-full FORCE=1 RERUN_FAILS=0 -> PASS
- make test-cdc-optional FORCE=1 RERUN_FAILS=0 -> PASS
- targeted canon repro/verification for postgres after runner noise fix -> PASS
- govulncheck run (post-fix environment check):
  - reachable code vulnerabilities: 0
  - module-level-only findings remained in aws-sdk-go and are not called from current code paths

Notes
- This commit intentionally captures the current branch state to preserve momentum in the cleanup stream.
- vendor_patched remains untouched as requested.
- Optional smoke suites that are intentionally blocked (eventhub/airbyte/oracle local smoke wiring) remain skipped by design and documented.
Hardened CI/local parity for the post-cleanup matrix by addressing ClickHouse 25.12 behavior changes and auth propagation gaps that surfaced under full forced runs (core + optional waves).

Key fixes included in this commit:

- clickhouse error classification: recognize cluster-not-found code 701 in distributed DDL fallback checks, with dedicated unit coverage.

- clickhouse recipe env propagation: forward prefixed RECIPE_CLICKHOUSE_PASSWORD to avoid empty-password auth failures in prefixed test recipes.

- e2e credential wiring: set ClickHouse password in manual ChSource constructions used by mongo2ch snapshot_flatten and kinesis2ch replication checks.

- pg2ch replication assertions: adapt replication/replication_ts checks to ClickHouse 25.12 semantics (FINAL CLEANUP table setting and row convergence behavior).

- testcontainer startup resilience: increase ClickHouse startup timeout in recipe waits to reduce transient readiness flakiness under matrix load.

- workflow stability guardrail: force serial package execution for the clickhouse provider package in generic CI test matrices to avoid container/reaper contention.

Validation executed locally:

- make build

- make test

- make test-cdc-full FORCE=1

- make test-cdc-optional FORCE=1

- go generate ./...

- golangci-lint (new-from-rev)

- govulncheck ./...
Introduce reusable CI and stream-specific callers for altinity (prod) and dev (integration), with optional e2e execution gated by CI_RUN_OPTIONAL repo variable and workflow_dispatch override.

Add dedicated dev Docker publish workflow to GHCR and optional manual promotion workflow to retag vetted dev digests into DockerHub prod tags.
Use golang:1.24.13-alpine3.22 in Dockerfile so GHCR dev image builds succeed with go.mod go 1.24.13 requirement.
Use linux/amd64-only publishing for dev stream and add job timeout to avoid prolonged multi-arch hangs while keeping prod DockerHub workflow unchanged.
… references

- Remove YDB debezium emitter/receiver and tests
- Remove YTSaurus logging, KV wrapper, and recipe helpers
- Remove Greenplum and OpenSearch connection code
- Remove S3 example (s3sqs2ch) and docs references
- Remove Elasticsearch and Delta docs references
- Clean up error codes for removed providers
- Update .mapping.json to remove deleted file entries
- Add .claude/ and reports/ to .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable linux/amd64 and linux/arm64 builds for dev image stream and disable provenance/sbom emission to avoid unknown/unknown attestation manifests in GHCR UI.
Cherry-picked from transferia/main: 72d663f

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: 6aaf95e

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: c7f81d6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: b67ceeb

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…s_default)

Cherry-picked from transferia/main: a86ec0f

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: 8cc6675

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: f5dc0fe

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…maMigrationDisabled)

Cherry-picked from transferia/main: 3432ce4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: fbd3058

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: 427dbf6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: dc6a632

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: 2084631

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: 51b23c5

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: 7f53a0a

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: 6035694

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked from transferia/main: a5d7034

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds buffer pooling, consolidates batch serializers, fixes JSON escaping,
fixes parquet file structure.

Cherry-picked from transferia/main: 5c3f2ed

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Splits the monolithic SampleableStorage interface into focused interfaces:
- SizeableStorage for TableSizeInBytes
- Sampleable for LoadRandomSample
- AccessCheckable for TableAccessible
- ChecksumableStorage for full checksum methods

Cherry-picked from transferia/main: 211782b
Includes fixes for API compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
bvt123 and others added 6 commits February 27, 2026 13:02
- Add explicit permissions blocks to CI workflows to limit GITHUB_TOKEN scope
- Add bounds checking for strconv.Atoi to int32/int8 conversions in pglogrepl

Fixes:
- Workflow permissions: ci-dev.yml, ci-prod.yml, reusable-ci.yml
- Integer conversion bounds: pglogrepl.go lines 449, 496, 504, 692

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add timeout-minutes: 30 to e2e-tests and generic-tests jobs
- Limit gotestsum --rerun-fails to 2 retries (was unlimited)
- Prevents infinite retry loops when testcontainers fail to start

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- ci-prod.yml and ci-dev.yml need contents:read to call reusable-ci.yml
- Add timeout-minutes: 30 to generic-tests, e2e-core, e2e-optional jobs
- Limit gotestsum retries to 2 in reusable-ci.yml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Merge tests/e2e-core/ and tests/e2e-optional/ into single tests/e2e/
- Delete non-aligned tests (kafka2kafka, mysql2kafka, pg2pg)
- Remove duplicate kafka2ch from e2e-optional
- Update Makefile: LAYER=e2e-core -> LAYER=e2e
- Update CI workflows to use tests/e2e paths
- Update matrix YAML files (cdc_local_suite, cdc_optional_suite, core2ch, sources)
- Fix MV column alias mismatch in kafka2ch/replication_mv
- Switch from external Zookeeper to ClickHouse built-in Keeper
- Disable optional e2e tests by default in CI
- Update AGENTS.md and tests/README.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add ChSinkMigrationOptions struct and MigrationOptions field to
ChDestination to support automatic column addition during schema
migration. This fixes the CI build failure in evolution tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Align storage test directory naming with canon tests (both now use
"postgres" instead of "pg") so Makefile test-layer command works
correctly for storage tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants