Conversation
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
📝 WalkthroughWalkthroughThe exec input plugin is updated to run in threaded mode ( ChangesExec Plugin Threaded Mode Deadlock Fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4482c18480
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| struct flb_input_plugin in_exec_plugin = { | ||
| .name = "exec", | ||
| .description = "Exec Input", | ||
| .flags = FLB_INPUT_THREADED, |
There was a problem hiding this comment.
Preserve conditional routing for exec inputs
Making every exec instance threaded disables an existing routing feature for this plugin: flb_input_log_append() explicitly skips conditional/per-record routes when flb_input_is_threaded(ins) is true (src/flb_input_log.c:1021-1025) and falls back to normal routing. Any exec input configured with routes.logs conditions or per-record routing will now send records to the default matched outputs instead of the conditional destinations, whereas it was non-threaded before this line. Please either keep exec non-threaded for configurations that use conditional routes or add threaded-input support for that routing path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_exec_deadlock_001.py (1)
49-59: ⚡ Quick winPrefer condition-based waiting over fixed sleep.
time.sleep(2)adds nondeterministic delay and can still be too short/long depending on CI load. Fold this intowait_for_conditionand let polling drive readiness.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_exec_deadlock_001.py` around lines 49 - 59, Remove the fixed sleep and fold that readiness check into the existing polling: delete the time.sleep(2) call and ensure wait_for_condition (used on service.service.wait_for_condition) performs the polling by calling service.request("/api/v1/uptime") until the lambda returns a non-None response that meets response["status_code"] == 200 and contains "uptime_sec" in response["body"]. Keep or adjust the existing timeout/interval parameters as needed but do not reintroduce any fixed sleeps.tests/integration/scenarios/internal_http_server/config/internal_http_server_exec_deadlock.yaml (1)
13-14: ⚡ Quick winBound the
curlexecution time to reduce test flakiness.Consider adding
--connect-timeoutand--max-timeso a stalled request doesn’t leave long-running child processes during regression runs.Proposed tweak
- command: curl -s http://127.0.0.1:${FLUENT_BIT_HTTP_MONITORING_PORT}/api/v1/metrics/prometheus + command: curl -s --connect-timeout 1 --max-time 2 http://127.0.0.1:${FLUENT_BIT_HTTP_MONITORING_PORT}/api/v1/metrics/prometheus🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/scenarios/internal_http_server/config/internal_http_server_exec_deadlock.yaml` around lines 13 - 14, The curl command in the YAML test config (the command field that uses ${FLUENT_BIT_HTTP_MONITORING_PORT}) can hang and cause flaky tests; update that command to include connection and overall timeouts (for example add --connect-timeout 2 and --max-time 5) so stalled requests fail fast instead of leaving long-running child processes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_exec_deadlock_001.py`:
- Around line 49-62: The test currently only checks HTTP responsiveness but
never proves the exec self-request path ran; update the test around the
wait_for_condition block to record the pre-exec value of the exec counter/metric
(e.g., the "exec.0" input/exec counter exposed by the service), trigger or allow
the exec self-request to run as already done, then after the wait assert that
the "exec.0" metric/counter increased (compare post-run value > pre-run value)
so the test fails if no exec activity occurred; use existing helpers like
service.request(...) or service.service.wait_for_condition to fetch the metric
and reference the "exec.0" metric name in your assertions.
---
Nitpick comments:
In
`@tests/integration/scenarios/internal_http_server/config/internal_http_server_exec_deadlock.yaml`:
- Around line 13-14: The curl command in the YAML test config (the command field
that uses ${FLUENT_BIT_HTTP_MONITORING_PORT}) can hang and cause flaky tests;
update that command to include connection and overall timeouts (for example add
--connect-timeout 2 and --max-time 5) so stalled requests fail fast instead of
leaving long-running child processes.
In
`@tests/integration/scenarios/internal_http_server/tests/test_internal_http_server_exec_deadlock_001.py`:
- Around line 49-59: Remove the fixed sleep and fold that readiness check into
the existing polling: delete the time.sleep(2) call and ensure
wait_for_condition (used on service.service.wait_for_condition) performs the
polling by calling service.request("/api/v1/uptime") until the lambda returns a
non-None response that meets response["status_code"] == 200 and contains
"uptime_sec" in response["body"]. Keep or adjust the existing timeout/interval
parameters as needed but do not reintroduce any fixed sleeps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 093ab459-b522-4650-bc0c-60f48e29817d
📒 Files selected for processing (3)
plugins/in_exec/in_exec.ctests/integration/scenarios/internal_http_server/config/internal_http_server_exec_deadlock.yamltests/integration/scenarios/internal_http_server/tests/test_internal_http_server_exec_deadlock_001.py
Fixes an internal HTTP server self-request deadlock by running the
execinput collector on Fluent Bit's existing threaded input path. This keeps blockingpopen()/read work out of the engine event loop while leaving the monitoring HTTP server on its original event loop.Adds an integration regression test where
execrunscurlagainst/api/v1/metrics/prometheusand verifies the internal HTTP server remains responsive.Validation
timeout.
cmake --build build -j8tests/integration/run_tests.py --quiet scenarios/internal_http_server/tests/ test_internal_http_server_exec_deadlock_001.py::test_http_server_responsive_afte r_exec_self_requesttests/integration/run_tests.py --valgrind --valgrind-strict --quiet scenarios/internal_http_server/tests/ test_internal_http_server_exec_deadlock_001.py::test_http_server_responsive_afte r_exec_self_requesttests/integration/run_tests.py --quiet scenarios/internal_http_serverctest --test-dir build -R flb-it-http_server --output-on-failureGITHUB_EVENT_NAME=pull_request GITHUB_BASE_REF=master tests/ integration/.venv/bin/python .github/scripts/commit_prefix_check.pyFluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Bug Fixes
Tests