forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 11
Closed
Labels
Description
We sometimes hit the following error when trying to export partition:
Code: 999. DB::Exception: Received from localhost:9000. Coordination::Exception. Coordination::Exception: Coordination error: Operation timeout, path /clickho
use/tables/shard0/source_ee53e34b_c9f6_11f0_9209_4369e6456e8f/exports/5_default.s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f. (KEEPER_EXCEPTION)
(query: ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '5' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
)
Recently this happened when trying to export partition on two different nodes on a cluster.
[clickhouse1] CREATE TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f ON CLUSTER sharded_cluster (
p UInt8,
i UInt64
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/shard0/source_ee53e34b_c9f6_11f0_9209_4369e6456e8f', '{replica}') ORDER BY tuple() PARTITION BY p;
[clickhouse1] CREATE TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f ON CLUSTER sharded_cluster (
p UInt8,
i UInt64
) ENGINE =
S3(
'[masked]:Secret(name='minio_uri')/root/data/export_part/tmp_ee53e35b_c9f6_11f0_9209_4369e6456e8f/',
'[masked]:Secret(name='minio_root_user')',
'[masked]:Secret(name='minio_root_password')',
filename='s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f',
format='Parquet',
compression='auto',
partition_strategy='hive'
)
PARTITION BY p;
[clickhouse1] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 2, rand64() FROM numbers(3);
[clickhouse1] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 3, rand64() FROM numbers(3);
[clickhouse1] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 4, rand64() FROM numbers(3);
[clickhouse1] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 5, rand64() FROM numbers(3);
[clickhouse2] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 1, rand64() FROM numbers(3);
[clickhouse2] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 2, rand64() FROM numbers(3);
[clickhouse2] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 3, rand64() FROM numbers(3);
[clickhouse2] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 4, rand64() FROM numbers(3);
[clickhouse2] INSERT INTO source_ee53e34b_c9f6_11f0_9209_4369e6456e8f (p, i) SELECT 5, rand64() FROM numbers(3);
We export partitions on node1
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '1' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '2' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '3' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '4' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '5' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
and when doing the same export on node 2 but with export_merge_tree_partition_force_export we hit the error at one of the exports
SET allow_experimental_export_merge_tree_part = 1;
SET export_merge_tree_partition_force_export = 1;
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '1' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '2' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '3' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '4' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
ALTER TABLE source_ee53e34b_c9f6_11f0_9209_4369e6456e8f EXPORT PARTITION ID '5' TO TABLE s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f
2025.11.25 13:04:56.565544 [ 4220 ] {} <Error> TCPHandler: Code: 999. Coordination::Exception: Coordination error: Operation timeout, path /clickhouse/tables/shard0/source_ee53e34b_c9f6_11f0_9209_4369e6456e8f/exports/5_default.s3_ee53e35c_c9f6_11f0_9209_4369e6456e8f. (KEEPER_EXCEPTION), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133d959f
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c88438e
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c883e40
3. DB::Exception::Exception<char const*, String const&>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<String const&>::type>, char const*&&, String const&) @ 0x000000000ed5172b
4. Coordination::Exception::fromPath(Coordination::Error, String const&) @ 0x000000000ed50e68
5. zkutil::ZooKeeper::existsWatch(String const&, Coordination::Stat*, std::function<void (Coordination::WatchResponse const&)>) @ 0x000000001a5901da
6. zkutil::ZooKeeper::exists(String const&, Coordination::Stat*, std::shared_ptr<Poco::Event> const&) @ 0x000000001a58ca9f
7. DB::StorageReplicatedMergeTree::exportPartitionToTable(DB::PartitionCommand const&, std::shared_ptr<DB::Context const>) @ 0x0000000018d3e67b
8. DB::MergeTreeData::alterPartition(std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::vector<DB::PartitionCommand, std::allocator<DB::PartitionCommand>> const&, std::shared_ptr<DB::Context const>) @ 0x000000001935525e
9. DB::InterpreterAlterQuery::executeToTable(DB::ASTAlterQuery const&) @ 0x0000000017f43b91
10. DB::InterpreterAlterQuery::execute() @ 0x0000000017f4058d
11. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, std::unique_ptr<DB::ReadBuffer, std::default_delete<DB::ReadBuffer>>&, std::shared_ptr<DB::IAST>&, std::shared_ptr<DB::ImplicitTransactionControlExecutor>) @ 0x000000001840a2d2
12. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x000000001840254b
13. DB::TCPHandler::runImpl() @ 0x0000000019b3818a
14. DB::TCPHandler::run() @ 0x0000000019b5a1d9
15. Poco::Net::TCPServerConnection::start() @ 0x000000001f084fc7
16. Poco::Net::TCPServerDispatcher::run() @ 0x000000001f085459
17. Poco::PooledThread::run() @ 0x000000001f04ba87
18. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001f049e81
19. ? @ 0x0000000000094ac3
20. ? @ 0x0000000000126850
I'm not sure if export_merge_tree_partition_force_export plays any role in that issue, because we've hit that same error on regular export partitions as well where no additional settings are set.
Cluster structure:
<sharded_cluster>
<shard>
<replica>
<host>clickhouse1</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>clickhouse2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>clickhouse3</host>
<port>9000</port>
</replica>
</shard>
</sharded_cluster>Logs: