fix: Add cleanup steps to prevent kuttl namespace deletion timeouts#789
Merged
fix: Add cleanup steps to prevent kuttl namespace deletion timeouts#789
Conversation
KubernetesExecutor DAG task pods with a Vector sidecar do not shut down gracefully on SIGTERM — Vector runs as a background process (not PID 1) and ignores the signal, causing pods to wait out the full 300s terminationGracePeriodSeconds before being force-killed. Since kuttl v0.15.0 waits for namespace deletion to complete, this blocks the test run past kuttl's timeout. Add a cleanup step to all KubernetesExecutor tests that deletes the AirflowCluster CR and force-deletes any remaining pods before kuttl tears down the namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NickLarsenNZ
approved these changes
May 8, 2026
Member
NickLarsenNZ
left a comment
There was a problem hiding this comment.
LGTM
The --force would normally mask the issue, but it's fine until the real fix is in.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
the KubernetesExecutor, to prevent namespace deletion timeouts.
waits up to 120s for pods to terminate, then force-deletes any stragglers.
.yaml.j2) for tests that parameterise the executor type; unconditional(
.yaml) for tests that always use the KubernetesExecutor.Root cause
The Vector sidecar container in KubernetesExecutor DAG task pods runs Vector as a
background process (
vector ... &), with bash as PID 1. When SIGTERM arrives duringnamespace deletion:
_STACKABLE_POST_HOOKin the base container (sleep 10; touch shutdown) failsbecause
sleepitself gets killed by SIGTERM before creating the shutdown file.terminationGracePeriodSeconds(300s).This was not visible before the kuttl v0.11.1 → v0.20.0 bump (2026-04-22), because
kuttl v0.11.1 fired namespace deletion and moved on without waiting.
Proper fix (operator-rs)
This PR is a workaround. The proper fix belongs in
operator-rs(
crates/stackable-operator/src/product_logging/framework.rs, around line 1444).There is already a commented-out alternative in the code (lines 1440–1443) that
uses
execto make Vector PID 1 so it receives and handles SIGTERM directly:This approach should be completed and enabled. Once that fix lands in operator-rs,
these cleanup steps can be removed.
Test plan
🤖 Generated with Claude Code