fix(flagd): back off before stream reconnect by looooown2006 · Pull Request #391 · open-feature/python-sdk-contrib

looooown2006 · 2026-05-10T09:30:03Z

Summary

Fixes the flagd provider stream reconnect loop so both gRPC stream implementations apply an application-level delay before creating the next stream after a stream error or normal stream completion.

Changes:

GrpcResolver now waits retry_backoff_max_ms before recreating the RPC EventStream.
GrpcWatcher now waits retry_backoff_max_ms before recreating the in-process SyncFlags stream.
shutdown uses an event-backed wait so shutdown can interrupt the backoff instead of blocking until the timeout expires.
added regression coverage for RPC and in-process stream error/completion reconnect paths.

Closes #390.

Verification

uv run --frozen pytest tests/test_grpc_watcher.py tests/test_grpc_resolver.py -q
# 10 passed

uv run --frozen pytest tests/test_config.py tests/test_errors.py tests/test_file_store.py tests/test_flagd.py tests/test_grpc_watcher.py tests/test_grpc_resolver.py tests/test_in_process.py tests/test_metadata.py tests/test_targeting.py -q
# 199 passed

uv tool run ruff check providers/openfeature-provider-flagd/src/openfeature/contrib/provider/flagd/resolvers/grpc.py providers/openfeature-provider-flagd/src/openfeature/contrib/provider/flagd/resolvers/process/connector/grpc_watcher.py providers/openfeature-provider-flagd/tests/test_grpc_watcher.py providers/openfeature-provider-flagd/tests/test_grpc_resolver.py
# All checks passed

uv run --frozen poe mypy
# Success: no issues found in 26 source files

Note: I also tried the broader non-e2e pytest command, but the local run still initialized the Docker-backed e2e test-harness fixture and failed before running those cases because Docker Desktop's Linux engine was not running on this Windows machine.

Signed-off-by: looooown2006 <102905927+looooown2006@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request introduces a _shutdown_event using threading.Event in both GrpcResolver and grpc_watcher to manage reconnection delays and facilitate cleaner shutdowns. It refactors the listen method in GrpcResolver into smaller helper methods and adds comprehensive unit tests for the reconnection and backoff logic. The reviewer suggests using the initial backoff duration rather than the maximum backoff for reconnection attempts to prevent excessive delays during recovery from transient failures.

gemini-code-assist · 2026-05-10T09:36:36Z

        logger.debug(self.config.fatal_status_codes)

        self.retry_grace_period = config.retry_grace_period
+        self.retry_backoff_max_seconds = config.retry_backoff_max_ms * 0.001


Using the maximum backoff duration (retry_backoff_max_ms) as a constant delay for every reconnect attempt can lead to significant delays in recovery from transient failures. It is generally better to use the initial backoff duration (retry_backoff_ms) or implement an exponential backoff strategy. Since gRPC already handles connection-level backoff internally, a small application-level delay is sufficient to prevent tight loops while allowing faster recovery.

Suggested change

self.retry_backoff_max_seconds = config.retry_backoff_max_ms * 0.001

self.retry_backoff_seconds = config.retry_backoff_ms * 0.001

self.retry_backoff_max_seconds = config.retry_backoff_max_ms * 0.001

gemini-code-assist · 2026-05-10T09:36:36Z

        )

+    def _wait_before_reconnect(self) -> None:
+        self._shutdown_event.wait(self.retry_backoff_max_seconds)


As mentioned in the initialization, using the maximum backoff for every retry significantly impacts availability. Consider using the initial backoff duration instead. Note that you will also need to update the corresponding regression tests in tests/test_grpc_resolver.py.

Suggested change

self._shutdown_event.wait(self.retry_backoff_max_seconds)

self._shutdown_event.wait(self.retry_backoff_seconds)

gemini-code-assist · 2026-05-10T09:36:36Z

        return False

+    def _wait_before_reconnect(self) -> None:
+        self._shutdown_event.wait(self.retry_backoff_max_seconds)


Using the maximum backoff duration (retry_backoff_max_ms) as a constant delay for every reconnect attempt can lead to significant delays in recovery from transient failures. It is generally better to use the initial backoff duration (retry_backoff_ms) or implement an exponential backoff strategy. Since gRPC already handles connection-level backoff internally, a small application-level delay is sufficient to prevent tight loops while allowing faster recovery. Note that you will also need to update the corresponding regression tests in tests/test_grpc_watcher.py.

Suggested change

self._shutdown_event.wait(self.retry_backoff_max_seconds)

self._shutdown_event.wait(self.retry_backoff_seconds)

looooown2006 · 2026-05-11T12:18:02Z

Thanks for the review. I kept this tied to retry_backoff_max_ms because #390 and the flagd provider spec both describe retryBackoffMaxMs as the explicit application-level delay before recreating the stream after an error or completion.

This is separate from gRPC's RPC-level retry behavior; the added wait only applies between one stream ending and the next stream being created. That seems to align with the issue/spec and the reconnect-loop scenario they call out.

fix(flagd): back off before stream reconnect

7282f02

Signed-off-by: looooown2006 <102905927+looooown2006@users.noreply.github.com>

looooown2006 requested review from a team as code owners May 10, 2026 09:30

github-actions Bot assigned aepfli and federicobond May 10, 2026

github-actions Bot requested review from aepfli and federicobond May 10, 2026 09:30

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(flagd): back off before stream reconnect#391

fix(flagd): back off before stream reconnect#391
looooown2006 wants to merge 1 commit into
open-feature:mainfrom
looooown2006:fix/flagd-stream-reconnect-backoff

looooown2006 commented May 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

gemini-code-assist Bot May 10, 2026

Uh oh!

looooown2006 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self.retry_backoff_max_seconds = config.retry_backoff_max_ms * 0.001
	self.retry_backoff_seconds = config.retry_backoff_ms * 0.001
	self.retry_backoff_max_seconds = config.retry_backoff_max_ms * 0.001

	self._shutdown_event.wait(self.retry_backoff_max_seconds)
	self._shutdown_event.wait(self.retry_backoff_seconds)

Conversation

looooown2006 commented May 10, 2026

Summary

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

looooown2006 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants