Fix flaky OTel integration test with DNS health check (#61070)#61242
Fix flaky OTel integration test with DNS health check (#61070)#61242jason810496 merged 2 commits intoapache:mainfrom
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
aafd71e to
a2f58d8
Compare
jason810496
left a comment
There was a problem hiding this comment.
Nice! Thank you for the PR.
I will wait until CI pass.
89a1a8a to
996b2c9
Compare
|
@jason810496 Quick follow-up on the approach: Currently, if the collector isn't reachable after the timeout, the Would you prefer if I modified this to return a boolean and use |
+1 on returning a boolean. However, I'm not entirely sure about skipping the tests. If the collector is supposed to be running but isn't reachable, I feel it might be better to let the tests fail (or fail explicitly) so we don't overlook infrastructure issues. |
|
@jason810496 |
Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
jason810496
left a comment
There was a problem hiding this comment.
Nice! LGTM, I will merge after the CI pass again, as I just triggered rebased to latest main on GitHub.
|
Thank you @jason810496 and @henry3260 for the guidance and reviews! Happy to see this stabilized. |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
…1070) (#61242) * Fix flaky OTel integration test with DNS health check (#61070) * Update airflow-core/tests/integration/otel/test_otel.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> --------- (cherry picked from commit 8ac25dd) Co-authored-by: Abhishek Mishra <mishra.abhishek2808@gmail.com> Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…ache#61070) (apache#61242) * Fix flaky OTel integration test with DNS health check (apache#61070) * Update airflow-core/tests/integration/otel/test_otel.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> --------- (cherry picked from commit 8ac25dd) Co-authored-by: Abhishek Mishra <mishra.abhishek2808@gmail.com> Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…1070) (#61242) (#61286) * Fix flaky OTel integration test with DNS health check (#61070) * Update airflow-core/tests/integration/otel/test_otel.py --------- (cherry picked from commit 8ac25dd) Co-authored-by: Abhishek Mishra <mishra.abhishek2808@gmail.com> Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…pache#61242) * Fix flaky OTel integration test with DNS health check (apache#61070) * Update airflow-core/tests/integration/otel/test_otel.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…pache#61242) * Fix flaky OTel integration test with DNS health check (apache#61070) * Update airflow-core/tests/integration/otel/test_otel.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…1070) (#61242) (#61286) * Fix flaky OTel integration test with DNS health check (#61070) * Update airflow-core/tests/integration/otel/test_otel.py --------- (cherry picked from commit 8ac25dd) Co-authored-by: Abhishek Mishra <mishra.abhishek2808@gmail.com> Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
* [v3-1-test] Add Keycloak token documentation to Security/API (#61228) (#61248) (cherry picked from commit bb04b5d) Co-authored-by: Bugra Ozturk <bugraoz93@users.noreply.github.com> * [v3-1-test] Fix language selector state not updating on change (#61060) (#61263) (cherry picked from commit 975cfe6) * [v3-1-test] Clarify template context for asset-triggered DAGs in airflow-core docs (#61258) (#61282) (cherry picked from commit f7aa502) Co-authored-by: Rachana Dutta <rupss2105@gmail.com> Co-authored-by: kevinhongzl <zhenlun.hong01@gmail.com> * [v3-1-test] Fix flaky OTel integration test with DNS health check (#61070) (#61242) (#61286) * Fix flaky OTel integration test with DNS health check (#61070) * Update airflow-core/tests/integration/otel/test_otel.py --------- (cherry picked from commit 8ac25dd) Co-authored-by: Abhishek Mishra <mishra.abhishek2808@gmail.com> Co-authored-by: Henry Chen <henryhenry0512@gmail.com> * [v3-1-test] Update pmc verification docs (#61271) (#61294) * Update Helm Chart release instructions for PMC Checks * Update KEY download instructions for PMC Checks * Update dev/README_RELEASE_HELM_CHART.md (cherry picked from commit c74b24a) * [v3-1-test] update version for release command (#61260) (#61328) (cherry picked from commit 7790482) Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com> * CI: Upgrade important CI environment (#61327) * [v3-1-test] Fix JWT token generation with unset issuer/audience config (#61278) (#61331) * Fix JWT token generation with unset issuer/audience config (cherry picked from commit a440d1d) Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com> * [v3-1-test] Remove empty `apache_airflow_site.py` file (#61308) (cherry picked from commit d65ff01) Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Bugra Ozturk <bugraoz93@users.noreply.github.com> Co-authored-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com> Co-authored-by: Shahar Epstein <60007259+shahar1@users.noreply.github.com> Co-authored-by: Rachana Dutta <rupss2105@gmail.com> Co-authored-by: kevinhongzl <zhenlun.hong01@gmail.com> Co-authored-by: Abhishek Mishra <mishra.abhishek2808@gmail.com> Co-authored-by: Henry Chen <henryhenry0512@gmail.com> Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com> Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
…pache#61242) * Fix flaky OTel integration test with DNS health check (apache#61070) * Update airflow-core/tests/integration/otel/test_otel.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
…pache#61242) * Fix flaky OTel integration test with DNS health check (apache#61070) * Update airflow-core/tests/integration/otel/test_otel.py Co-authored-by: Henry Chen <henryhenry0512@gmail.com> --------- Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
Description
This PR fixes a flaky integration test:
test_scheduler_change_after_the_first_task_finishesintests/integration/otel/test_otel.py.The Problem:
The test frequently failed in CI and local Breeze environments with an
AssertionError(missingtask2span) and aurllib3.exceptions.NameResolutionErrorfor the hostbreeze-otel-collector.This was caused by a race condition where the Airflow test components attempted to connect to the OpenTelemetry (OTel) collector before Docker's internal DNS had fully propagated or before the collector service was ready to accept connections. This resulted in dropped spans and failed assertions.
The Fix:
I implemented a robust health check mechanism,
wait_for_otel_collector(), within theTestOtelIntegrationclass.socket.create_connectionto poll the collector's availability.socket.gaierror(DNS resolution) andConnectionRefusedErrorwith a 60-second timeout.setup_classmethod now calls this health check before any tests execute, ensuring the infrastructure is stable.This is a targeted fix that addresses the root cause of the network flakiness and infrastructure timing issues without modifying core production code.
Related Issues
Was generative AI tooling used to co-author this PR?