Skip to content

Crash or LOGICAL_ERROR with experimental correlated subqueries using EXISTS and QUALIFY #1441

@CarlosFelipeOR

Description

@CarlosFelipeOR

I checked the Altinity Stable Builds lifecycle table, and the Altinity Stable Build version I'm using is still supported.

Type of problem

Bug report - something's broken

Describe the situation

The ClickHouse server crashes (SIGABRT on debug/sanitizer builds) or returns a LOGICAL_ERROR (code 1001) on release builds when executing a query that combines correlated subqueries with EXISTS and QUALIFY clause while allow_experimental_correlated_subqueries is enabled.
The crash originates from an std::out_of_range exception on a vector access, triggered during query pipeline execution.

This issue:

  • Causes SIGABRT (server crash) on debug and sanitizer builds (msan, tsan, asan, ubsan)
  • Returns error code 1001 (STD_EXCEPTION) on release builds — the server stays up
  • Is a pre-existing upstream bug — present in the upstream ClickHouse CI since at least July 2024 (version 24.7), with 51+ occurrences across versions 24.7, 24.9, 24.10, 24.11, 25.3, 25.8, 26.1, and 26.2
  • Was detected in Altinity Antalya 25.8.16 via the AST fuzzer (amd_msan) CI job
  • No existing GitHub issue was found upstream tracking this failure

How to reproduce the behavior

Environment

  • Version: 25.8.16.20001.altinityantalya (also reproducible on upstream 24.7+)
  • Build type: Any (crashes on debug/sanitizer, returns error on release)

Steps

  1. Create a simple test table:
CREATE TABLE test (i1 Int64, i2 Int64) ENGINE = MergeTree ORDER BY i1;
INSERT INTO test VALUES (1, 2), (3, 4), (5, 6);
  1. Enable the required experimental settings:
SET allow_experimental_correlated_subqueries = 1;
SET allow_experimental_analyzer = 1;
  1. Run the crashing query:
SELECT 1 FROM test AS t1 WHERE exists((
    SELECT 76, 8, (t2.i2 = t1.i1) AND 13, *
    FROM test AS t2 WHERE materialize(36) AND 13
)) QUALIFY 3;

Expected behavior

The query should either return a valid result or a user-facing error (e.g., NOT_IMPLEMENTED, SYNTAX_ERROR). A LOGICAL_ERROR (code 1001) should never be reachable by any user query.


Actual behavior

On release builds

The server returns an error but stays up:

Received exception from server (version 25.8.16):
Code: 1001. DB::Exception: Received from localhost:9000. DB::Exception: std::out_of_range: vector. (STD_EXCEPTION)

On debug/sanitizer builds (CI)

The server crashes with SIGABRT:

2026.02.23 18:53:01.967492 [ 2567016 ] {3c1ed47d-62a2-4023-a2be-435d1b246812} <Fatal> : Logical error: 'std::exception. Code: 1001, type: std::out_of_range, e.what() = vector (version 25.8.16.20001.altinityantalya (altinity build)), Stack trace:

0. ./contrib/llvm-project/libcxx/include/__exception/exception.h:113: std::logic_error::logic_error(char const*) @ 0x000000005a22e123
1. std::out_of_range::out_of_range[abi:ne190107](char const*) @ 0x0000000009fa974e
2. std::__throw_out_of_range[abi:ne190107](char const*) @ 0x0000000009fa969c
3. ./contrib/llvm-project/libcxx/include/vector:1001: _GLOBAL__sub_I_StorageSystemDisks.cpp @ 0x000000002d3643d0
4. ./contrib/llvm-project/libcxx/include/vector:1458: DB::CrossJoinResult::next()::$_1::operator()(std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>> const&) const @ 0x000000003baf1f8e
5. ./ci/tmp/build/./src/Interpreters/HashJoin/HashJoin.cpp:989: DB::CrossJoinResult::next() @ 0x000000003baedee5
6. ./ci/tmp/build/./src/Processors/Transforms/JoiningTransform.cpp:224: DB::JoiningTransform::readExecute(DB::Chunk&) @ 0x000000004b69d058
7. ./ci/tmp/build/./src/Processors/Transforms/JoiningTransform.cpp:205: DB::JoiningTransform::transform(DB::Chunk&) @ 0x000000004b69a1b1
8. ./ci/tmp/build/./src/Processors/Transforms/JoiningTransform.cpp:136: DB::JoiningTransform::work() @ 0x000000004b698af6
9. ./ci/tmp/build/./src/Processors/Executors/ExecutionThreadContext.cpp:53: DB::ExecutionThreadContext::executeTask() @ 0x000000004ac8acac
10. ./ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:351: DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::atomic<bool>*) @ 0x000000004ac4bb9c
11. ./ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:279: void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads(std::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x000000004ac5087e
12. ./contrib/llvm-project/libcxx/include/__functional/function.h:716: ? @ 0x00000000253bbc10
13. ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:117: ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'()::operator()() @ 0x00000000253d00b4
14. ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:149: void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000253cfe5f
15. ./contrib/llvm-project/libcxx/include/__functional/function.h:716: ? @ 0x00000000253b4227
16. ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:117: void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x00000000253c8264
17. start_thread @ 0x000000000009caa4
18. clone3 @ 0x0000000000129c6c
'.
2026.02.23 18:53:01.979782 [ 2567016 ] {3c1ed47d-62a2-4023-a2be-435d1b246812} <Fatal> : Stack trace (when copying this message, always include the lines below):

0. ./ci/tmp/build/./src/Common/StackTrace.cpp:389: StackTrace::StackTrace() @ 0x0000000025170827
1. ./ci/tmp/build/./src/Common/Exception.cpp:56: DB::abortOnFailedAssertion(String const&) @ 0x0000000024f5f8ad
2. ./ci/tmp/build/./src/Common/Exception.cpp:581: DB::getCurrentExceptionMessageAndPattern(bool, bool, bool) @ 0x0000000024f7615a
3. ./ci/tmp/build/./src/Common/Exception.cpp:520: DB::ExecutionStatus::fromCurrentException(String const&, bool) @ 0x0000000024f7939d
4. ./ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:153: DB::PipelineExecutor::execute(unsigned long, bool) @ 0x000000004ac494fd
5. ./ci/tmp/build/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:76: void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000004aca76e6
6. ./contrib/llvm-project/libcxx/include/__functional/function.h:716: ? @ 0x00000000253b4227
7. ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:117: void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x00000000253c8264
8. start_thread @ 0x000000000009caa4
9. clone3 @ 0x0000000000129c6c

Changed settings at crash time:

allow_experimental_correlated_subqueries = true
allow_experimental_analyzer = true
correlated_subqueries_substitute_equivalent_expressions = false

Upstream CI history

Month Upstream CI hits
2024-07 1
2024-09 4
2024-10 5
2024-11 1
2025-03 36 (spike during PR ClickHouse#75942 "Join to Subquery")
2025-07 1
2025-12 1
2026-02 2

Additional context

CI failure

  • Job: AST fuzzer (amd_msan)
  • Branch: antalya-25.8
  • Commit: 027f87165d1035cbd110054e534db3f0e7bb09b9
  • CI report: ci_run_report.html
  • Logs: json.html

Not a consistent failure

Out of 67 AST fuzzer runs on the antalya-25.8 branch, only 1 hit this crash (1.5% rate). All other fuzzer build types (debug, tsan, asan, ubsan) passed on the same commit.

Not caused by a backport

The bug first appeared upstream in July 2024 (version 24.7.1.1620) on PR ClickHouse#66346, predating the 25.8 branch. It is inherent to the experimental correlated subqueries feature in the shared codebase.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions