You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Checked job console output and collected artifacts under /tmp/tidb_cdc_test.
Found two likely causes that make the Pulsar next-gen jobs flaky when running many cases sequentially (tests/integration_tests/run.sh):
tests/integration_tests/_utils/run_pulsar_consumer starts cdc_pulsar_consumer in background, but tests/integration_tests/_utils/stop_tidb_cluster did not kill it. This leaks the consumer process across cases and can accumulate background processes/resource pressure in a single job.
Next-gen integration scripts generate TiDB config with section-sensitive keys in the wrong place, e.g. appending socket = ... after [instance] becomes instance.socket, and writing max-server-connections under [instance] becomes instance.max-server-connections. TiDB reports these as invalid configuration options and ignores them.
Example from CI artifacts: config file ... contained invalid configuration options: run-auto-analyze, server-memory-quota, analyze-always-skip-wide-columns, instance.max-server-connections, instance.socket.
What did you expect to see?
Pulsar next-gen integration jobs should reliably start TiDB/PD/TiKV for every case, and background processes started by test cases should be cleaned up between cases.
What did you see instead?
Pulsar next-gen jobs fail with repeated MySQL connect errors and finally Failed to start TiDB while verifying TiDB health (tests/integration_tests/_utils/check_tidb_health).
TiDB prints invalid configuration option warnings, and in some cases PD reports keyspace pre-alloc / region split timeouts, which can cascade into TiDB bootstrap failures.
In the same PR run, other next-gen integration suites (e.g. Kafka/Storage, MySQL light) can pass, suggesting this is not a universal next-gen environment outage.
Versions of the cluster
Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):
From CI logs (next-gen):Release Version: v9.0.0-beta.2.pre-1255-g5f3cbfeKernel Type: Next Generation
Upstream TiKV version (execute tikv-server --version):
From CI logs:Release Version: 8.5.4+branch-HEADEdition: Cloud Storage EngineGit Commit Hash: a5d8e11ebb420ff2ae27ea5f251b6893cbe62a64
TiCDC version (execute cdc version):
PR build in CI (pingcap/ticdc#4264), commit aa290841c41294377843c6631e009dc993ae7a8d
What did you do?
/tmp/tidb_cdc_test.tests/integration_tests/run.sh):tests/integration_tests/_utils/run_pulsar_consumerstartscdc_pulsar_consumerin background, buttests/integration_tests/_utils/stop_tidb_clusterdid not kill it. This leaks the consumer process across cases and can accumulate background processes/resource pressure in a single job.socket = ...after[instance]becomesinstance.socket, and writingmax-server-connectionsunder[instance]becomesinstance.max-server-connections. TiDB reports these as invalid configuration options and ignores them.config file ... contained invalid configuration options: run-auto-analyze, server-memory-quota, analyze-always-skip-wide-columns, instance.max-server-connections, instance.socket.What did you expect to see?
What did you see instead?
Failed to start TiDBwhile verifying TiDB health (tests/integration_tests/_utils/check_tidb_health).Versions of the cluster
Upstream TiDB cluster version (execute
SELECT tidb_version();in a MySQL client):Upstream TiKV version (execute
tikv-server --version):TiCDC version (execute
cdc version):PR build in CI (pingcap/ticdc#4264), commit aa290841c41294377843c6631e009dc993ae7a8d