[FLINK-38959][postgres] Update split state's table schemas info and infer schema change event based on pgoutput plugin's relation message. by loserwang1024 · Pull Request #4316 · apache/flink-cdc

loserwang1024 · 2026-03-16T07:15:17Z

As disscuss in #4233, Thanks for both hard wor by @linjianchang 's #4259 and @zml1206 's #4233 . I have another design idea that prefers to leverage PostgreSQL’s native capabilities more fully.

PostgreSQL Pgoutput's relation message in logical replication

As document of PostgreSQL 16 says: Section 55.5.3 - Logical Replication Protocol Message Flow：

"Every DML message contains a relation OID, identifying the publisher's relation that was acted on. Before the first DML message for a given relation OID, a Relation message will be sent, describing the schema of that relation. Subsequently, a new Relation message will be sent if the relation's definition has changed since the last Relation message was sent for it."

When DDL changes are executed in PostgreSQL, no corresponding logs are generated. However, if the pgoutput sender is about to send the first DML message for a new schema, it will first send a Relation message.

The relation message will include the schema of table.

how Debezium use it?

Depending on The decoder plug-in, schema updates take two completely different paths:
● pgoutput: Sends a correlation message before each DML event,You can use the applySchemaChangesForTable to actively update schema in advance. shouldSchemaBeSynchronized() returns false, so synchronizeTableSchema() is an empty operation for DML events.
● decoderbufs: If no RELATION message is sent, shouldSchemaBeSynchronized() returns true (the default value). The schema is synchronized by comparing the message column with the in-memory schema in a reactive manner.
Thus, if it is pgoutput, we can send schema change events on demand without comparing each message?

What cdc need to do?

Therefore, my personal opinion:

Extend the PostgresSchema and pass in a dispatcher. When a correlation message is received,Put the schema in the event queue as a special message
When Postgres RecordEmitter receives the schema event, it:
Update schemas in split,For persistence (in the current master branch, since pg cdc does not have a schema change event, the information in schemas will never be updated)
Compare table changes and send scheme ddl in yaml.

In this way, we can avoid intrusive changes to PostgreSQL through triggers, and there is also no need to compare schemas for every message. More importantly, it updates the schemas stored in the split, so schema consistency can still be guaranteed after state recovery or restart.

Though only useful for pgoutput, but I think it's enough. Because, the pgoutput is the only official replication plugin.

loserwang1024 · 2026-03-17T05:37:59Z

@ruanhang1993 @leonardBang @linjianchang @zml1206 @LYanquan @yuxiqian , Would you like to help me review this PR?

leonardBang · 2026-03-17T05:57:12Z

@ruanhang1993 @leonardBang @linjianchang @zml1206 @LYanquan @yuxiqian , Would you like to help me review this PR?

My pleasure, will review soon

...-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/utils/SchemaChangeUtil.java

Daishuyuan · 2026-03-17T08:41:57Z

...-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/utils/SchemaChangeUtil.java

+     * with memoization. Available operations: rename column, add column at last, drop column, alter
+     * column type. Recursion depth bounded by total column count.
+     */
+    private static List<SchemaChangeEvent> inferMinimalSchemaChanges(


Thanks for the update. I have one question about the DP design.

It looks like the algorithm does not try to preserve the existing suffix in a head-insert case. For example:

before: [b, c]
after: [a, b, c]

the inferred result becomes Rename(b -> a), Rename(c -> b), Add(c), instead of treating this as adding a new leading column.

Is this intentional because the supported operation set only allows trailing AddColumnEvent, or is this just a limitation of the current implementation?

If this is intentional, could we document it more explicitly? Otherwise, it seems the "minimal" result here is only minimal under the current restricted cost model, but
not necessarily the most natural schema change sequence.

I am still learning this part of the codebase, so I would really appreciate any correction or guidance.

@Daishuyuan
As far as I understand, PostgreSQL does not support inserting a column at the head of a table. ALTER TABLE ... ADD COLUMN only appends the new column to the end of the column list:

https://www.postgresql.org/docs/current/sql-altertable.html

So in a case like:

before: [b, c]
after: [a, b, c]
a true “head-insert” is not something PostgreSQL would normally produce through native schema evolution on the same table.

The reason I used this algorithm is that I am trying, as much as possible, to preserve the original DDL semantics that the source database can actually produce. I’ll add this explanation near the beginning of the method to make the assumption clearer.

In the current design, the main ambiguous case is the last-column scenario: for example, distinguishing rename of the last column from drop + add of a last column with the same name.（Before any data）.It's not common in production that user don't do any data modification between drop + add of a last column with the same name.

So yes, this behavior is intentional under the current design constraints, rather than an attempt to find the most “natural” edit sequence in the abstract

Thanks for the detailed explanation — this makes the assumption much clearer.

I understand the intent now: the algorithm is designed to preserve DDL semantics that the source database can actually produce, rather than infer the most abstract edit sequence.

Appreciate the clarification.

...rc/main/java/org/apache/flink/cdc/connectors/postgres/factory/PostgresDataSourceFactory.java

Copilot

Pull request overview

This PR enhances the PostgreSQL CDC connectors to leverage pgoutput Relation messages for updating split-state table schemas and inferring schema change events without relying on intrusive database triggers.

Changes:

Introduces a Relation-aware Debezium PostgresSchema extension that dispatches Relation-based schema updates into the change-event queue.
Adds schema-diff inference utilities and updates the Postgres pipeline record emitter to emit CDC SchemaChangeEvents derived from Relation messages.
Expands source-connector and pipeline-connector test coverage for schema evolution + state snapshot schema persistence.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/schema/SchemaDispatcher.java	Adds dispatcher abstraction for pushing schema updates into the queue.
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/schema/RelationAwarePostgresSchema.java	Extends Debezium `PostgresSchema` to dispatch Relation-based schema updates.
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/schema/PostgresSchemaRecord.java	Adds a custom `SourceRecord` wrapper to carry Debezium `Table` schemas.
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresSourceRecordEmitter.java	Adds a Postgres-specific emitter hook for schema-change extraction.
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/fetch/PostgresSourceFetchTaskContext.java	Installs Relation-aware schema + dispatcher and handles table-id extraction for schema records.
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/fetch/CDCPostgresDispatcher.java	Enqueues Relation-driven schema records into Debezium queue.
flink-connector-postgres-cdc/src/main/java/io/debezium/connector/postgresql/PostgresObjectUtils.java	Adds schema bootstrap from split-state schemas (no refresh).
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/config/PostgresSourceConfigFactory.java	Adds schema-change enablement flag to config factory.
flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/config/PostgresSourceConfig.java	Carries `schemaChangeEnabled` into runtime config.
flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/reader/IncrementalSourceRecordEmitter.java	Refactors schema-change parsing into an overridable method.
flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/SourceSplitSerializer.java	Fixes deserialization to respect per-entry `useCatalogBeforeSchema`.
flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/utils/SchemaChangeUtil.java	Adds minimal-edit schema diff inference (add/drop/rename/alter-type).
flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresPipelineRecordEmitter.java	Emits inferred `SchemaChangeEvent`s for Relation schema updates.
flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresDataSourceOptions.java	Adds `schema-change.enabled` pipeline option.
flink-cdc-pipeline-connector-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/factory/PostgresDataSourceFactory.java	Wires new option into source builder (currently problematic).
flink-connector-postgres-cdc/src/test/.../PostgresSourceReaderTest.java	Adds state-update test verifying schema persistence across snapshotState.
flink-connector-postgres-cdc/src/test/.../IncrementalSourceStreamFetcherTest.java	Adds ordering test ensuring schema record is enqueued around DML.
flink-connector-postgres-cdc/src/test/.../PostgresScanFetchTaskTest.java	Updates factory method signature usage.
flink-connector-postgres-cdc/src/test/java/.../PostgresTestBase.java	Adds optional slot-name plumbing for tests.
flink-cdc-pipeline-connector-postgres/src/test/.../PostgresPipelineITCaseTest.java	Adds end-to-end schema evolution IT case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...rc/main/java/org/apache/flink/cdc/connectors/postgres/factory/PostgresDataSourceFactory.java

...cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresSourceBuilder.java

...java/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresSourceRecordEmitter.java

...c/main/java/org/apache/flink/cdc/connectors/postgres/source/schema/PostgresSchemaRecord.java

...va/org/apache/flink/cdc/connectors/postgres/source/reader/PostgresPipelineRecordEmitter.java

...-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/utils/SchemaChangeUtil.java

…nfer schema change event based on pgoutput plugin's relation message.

github-actions bot added build base postgres-cdc-connector postgres-pipeline-connector labels Mar 16, 2026

loserwang1024 force-pushed the poc-schema-change-event branch 2 times, most recently from 1ea71bb to 51c2077 Compare March 17, 2026 04:13

leonardBang self-requested a review March 17, 2026 05:57

zml1206 reviewed Mar 17, 2026

View reviewed changes

...-postgres/src/main/java/org/apache/flink/cdc/connectors/postgres/utils/SchemaChangeUtil.java Show resolved Hide resolved

Daishuyuan reviewed Mar 17, 2026

View reviewed changes

yuxiqian reviewed Mar 17, 2026

View reviewed changes

...rc/main/java/org/apache/flink/cdc/connectors/postgres/factory/PostgresDataSourceFactory.java Outdated Show resolved Hide resolved

loserwang1024 force-pushed the poc-schema-change-event branch from 51c2077 to 199aa53 Compare March 17, 2026 12:55

leonardBang requested a review from Copilot March 19, 2026 03:30

Copilot AI reviewed Mar 19, 2026

View reviewed changes

[FLINK-38959][postgres] Update split state's table schemas info and i…

45d255e

…nfer schema change event based on pgoutput plugin's relation message.

loserwang1024 force-pushed the poc-schema-change-event branch from 199aa53 to 45d255e Compare March 19, 2026 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-38959][postgres] Update split state's table schemas info and infer schema change event based on pgoutput plugin's relation message.#4316

[FLINK-38959][postgres] Update split state's table schemas info and infer schema change event based on pgoutput plugin's relation message.#4316
loserwang1024 wants to merge 1 commit intoapache:masterfrom
loserwang1024:poc-schema-change-event

loserwang1024 commented Mar 16, 2026

Uh oh!

loserwang1024 commented Mar 17, 2026 •

edited

Loading

Uh oh!

leonardBang commented Mar 17, 2026

Uh oh!

Uh oh!

Daishuyuan Mar 17, 2026

Uh oh!

loserwang1024 Mar 17, 2026 •

edited

Loading

Uh oh!

Daishuyuan Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

loserwang1024 commented Mar 16, 2026

PostgreSQL Pgoutput's relation message in logical replication

how Debezium use it?

What cdc need to do?

Uh oh!

loserwang1024 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leonardBang commented Mar 17, 2026

Uh oh!

Uh oh!

Daishuyuan Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

loserwang1024 Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daishuyuan Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

loserwang1024 commented Mar 17, 2026 •

edited

Loading

loserwang1024 Mar 17, 2026 •

edited

Loading

Daishuyuan Mar 19, 2026 •

edited

Loading