Skip to content

chore: Handle Fallback From FDv2 to FDv1#339

Merged
aaron-zeisler merged 24 commits intomainfrom
aaronz/SDK-1829/fdv2-to-fdv1-fallback-handling
Apr 8, 2026
Merged

chore: Handle Fallback From FDv2 to FDv1#339
aaron-zeisler merged 24 commits intomainfrom
aaronz/SDK-1829/fdv2-to-fdv1-fallback-handling

Conversation

@aaron-zeisler
Copy link
Copy Markdown
Contributor

@aaron-zeisler aaron-zeisler commented Mar 31, 2026

The Goal

When the LaunchDarkly backend determines that a client should not be using FDv2, it signals this by including an x-ld-fd-fallback: true header in HTTP responses. This PR teaches the Android SDK to detect that header and seamlessly switch from FDv2 data sources to an FDv1 polling fallback, so the SDK continues to receive flag data without interruption.

JIRA ticket: https://launchdarkly.atlassian.net/browse/SDK-1829

The Approach

The implementation follows a layered approach, working from the bottom of the stack upward:

  1. Data model — The result types that flow between components gained an fdv1Fallback boolean, so any component that receives a server response can carry the fallback signal forward without losing it. This mirrors the js-core approach, where the fallback signal is carried on the data source result type rather than handled as a side channel.

  2. Header detection — The polling requestor and streaming synchronizer both read the x-ld-fd-fallback header and attach it to every result they produce — success, error, or otherwise. This means the signal is never silently dropped. Both js-core and java-core detect the header at the transport layer; the Android implementation follows the same pattern.

  3. Fallback synchronizer — A new standalone FDv1 polling synchronizer fetches flag data from the legacy /sdk/evalx endpoints and produces results in the FDv2 format. It's intentionally decoupled from the existing PollingDataSource / HttpFeatureFlagFetcher stack because those are tightly coupled to FDv1's DataSourceUpdateSink. This matches js-core's approach of implementing a dedicated FDv1 polling data source rather than reusing the existing one.

  4. Orchestration — The FDv2 data source's run loop checks for the fallback signal after processing each result. When detected, it blocks all FDv2 synchronizers, unblocks the FDv1 slot, and lets the normal synchronizer selection logic pick it up. No special "fallback mode" — it reuses the existing synchronizer cycling mechanism. The SourceManager.fdv1Fallback() implementation follows js-core's pattern of resetting the synchronizer index to -1 so the next scan starts from the beginning of the list.

Fundamental Changes

  • FDv2SourceResult and FDv2PayloadResponse now carry a fallback indicator alongside their existing data. All factory methods retain backward-compatible defaults.

  • ModeDefinition supports a separate list of FDv1 fallback synchronizer factories, kept distinct from regular synchronizers so they can be independently blocked/unblocked.

  • The FDv2DataSource constructor accepts an optional list of FDv1 fallback synchronizers. It handles wrapping them as blocked entries internally, so callers just pass plain factory lists.

  • SourceManager.fdv1Fallback() is now a real implementation (previously a stub). It synchronizes on the active source lock to prevent races with concurrent synchronizer selection.

  • The default mode table configures an FDv1 polling fallback for STREAMING, POLLING, and BACKGROUND modes. OFFLINE and ONE_SHOT don't have active synchronizers, so they have no fallback slot.

Some Things That Aren't In Scope

  • Recovery from FDv1 back to FDv2 — Once the SDK falls back to FDv1, it stays there for the lifetime of the data source. Recovery would require server-side coordination that isn't specified yet.
  • FDv1 streaming fallback — The spec allows for it, but only polling is implemented as the fallback synchronizer. This matches the js-core reference implementation.
  • Cache initializer — The mode table has TODO placeholders for cache initializers; that's a separate piece of work.
  • State debouncing — Network and lifecycle event debouncing (CONNMODE 3.5) is deferred to a future PR.

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Related issues

Provide links to any issues in this repository or elsewhere relating to this pull request.

Describe the solution you've provided

Provide a clear and concise description of what you expect to happen.

Describe alternatives you've considered

Provide a clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the pull request here.


Note

Medium Risk
Introduces new cross-cutting fallback behavior in the flag update pipeline (header propagation, mode tables, and synchronizer orchestration), which could affect initialization/state transitions and update delivery if mishandled.

Overview
Adds support for automatic FDv2→FDv1 fallback when the backend returns x-ld-fd-fallback: true, switching the active synchronizer to an FDv1 polling implementation.

This threads an fdv1Fallback signal through FDv2PayloadResponse/FDv2SourceResult, teaches both polling (DefaultFDv2Requestor/FDv2PollingBase) and streaming (FDv2StreamingSynchronizer) paths to detect the header, and updates FDv2DataSource/SourceManager to honor it by blocking FDv2 synchronizers and activating a dedicated FDv1PollingSynchronizer.

It also extends ModeDefinition/ResolvedModeDefinition and the default mode table (DataSystemComponents.makeDefaultModeTable/DataSystemBuilder) to include an optional FDv1 fallback slot for STREAMING/POLLING/BACKGROUND, plumbs cache dir via DataSourceBuildInputs, moves poll interval defaults to LDConfig, and adds unit/contract-test coverage for header detection and fallback switching.

Reviewed by Cursor Bugbot for commit 16b9f8f. Bugbot is set up for automated code reviews on this repo. Configure here.

Add an fdv1Fallback boolean to FDv2SourceResult and FDv2PayloadResponse
so that FDv2 data sources can signal when the server's x-ld-fd-fallback
header indicates a fallback to FDv1. Existing factory methods retain
their signatures and default to false; new overloads accept the flag
explicitly.

Made-with: Cursor
Extend ModeDefinition and ResolvedModeDefinition with a separate list
of FDv1 fallback synchronizer factories. These are kept distinct from
the regular synchronizer list so that the orchestration layer can
manage their blocked/unblocked state independently. The existing
two-argument constructor delegates to the new three-argument form
with an empty fallback list, preserving backward compatibility.

Made-with: Cursor
Read the x-ld-fd-fallback response header in the polling requestor and
streaming synchronizer, and propagate the fallback signal through every
FDv2SourceResult produced by these components. The header is checked
case-insensitively and carried on both success and error paths so the
orchestration layer can act on it regardless of how the response was
classified.

Made-with: Cursor
Implement a standalone FDv1 polling synchronizer that fetches flag data
from the legacy /sdk/evalx endpoints and produces FDv2SourceResult
values. This allows the FDv2 orchestration layer to fall back to FDv1
polling without depending on the existing PollingDataSource /
HttpFeatureFlagFetcher stack, which is tightly coupled to the FDv1
DataSourceUpdateSink. Uses LDUtil.isHttpErrorRecoverable() for
FDv2-style error classification (INTERRUPTED vs TERMINAL_ERROR).

Made-with: Cursor
Connect the fallback signal to the orchestration layer end-to-end.
The FDv2DataSource constructor now accepts an optional list of FDv1
fallback synchronizer factories, wrapping them as blocked entries
appended after the regular synchronizers. When a synchronizer result
carries fdv1Fallback=true, the run loop triggers SourceManager's
fdv1Fallback() method, which blocks all FDv2 synchronizers, unblocks
the FDv1 slot, and resets the scan index under the active source lock.
The builder configures an FDv1PollingSynchronizer in the default mode
table for STREAMING, POLLING, and BACKGROUND modes (not OFFLINE or
ONE_SHOT, which have no synchronizers to fall back from).

Made-with: Cursor
logger.info("Server signaled FDv1 fallback; switching to FDv1 polling synchronizer.");
sourceManager.fdv1Fallback();
running = false;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from code review:

  • Will running ever be false here (at the beginning of the if statement)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment to this block. There are a couple situations where running is false: during SHUTDOWN, and as a result of TERMINAL_ERROR.

}
}

private FDv2SourceResult doPoll() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review notes:

  • Can we safely refactor the production FDv1 code to reuse this entire doPoll() function?
  • Similarly, can we reuse the existing code on lines 87-101?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't addressed this yet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my latest push, I've refactored this new FDv1PollingSynchronizer by injecting a FeatureFetcher object. That's the same helper class that the existing PollingDataSource uses 👍

Remove single-argument overloads from FDv2SourceResult, FDv2PayloadResponse,
and ModeDefinition, keeping only the versions that include the fdv1Fallback
parameter. Update all call sites across production and test code.

Made-with: Cursor
…nstead of list

Aligns with java-core and js-core implementations, which both model FDv1
fallback as a single optional synchronizer rather than a list.

Made-with: Cursor
@aaron-zeisler
Copy link
Copy Markdown
Contributor Author

@tanderson-ld In the very last commit (titled "persist fdv1 fallback state ..."), I addressed your concern about staying in "FDv1 fallback mode" when the platform state changes. Interestingly, this type of protection isn't present in the js-core implementation. I'll bring this up w/ Ryan tomorrow to see if he's aware of that gap, or if he chose not to make those changes for some reason.

@aaron-zeisler aaron-zeisler marked this pull request as ready for review April 1, 2026 03:31
@aaron-zeisler aaron-zeisler requested a review from a team as a code owner April 1, 2026 03:31
Check for x-ld-fd-fallback during the initializer phase, not just
during synchronization. When an initializer response carries the
fallback signal, skip remaining initializers and transition directly
to the FDv1 synchronizer. Aligns with js-core's implementation.

Made-with: Cursor
@aaron-zeisler aaron-zeisler force-pushed the aaronz/SDK-1829/fdv2-to-fdv1-fallback-handling branch from b162d60 to dddc21f Compare April 1, 2026 16:43
…onses

The notModified() factory previously hardcoded fdv1Fallback to false,
silently discarding the x-ld-fd-fallback header on 304 responses. Now
the computed value is forwarded, aligning with the CSFDV2 spec and
js-core behavior.

Made-with: Cursor
Mirror how PollingDataSource delegates HTTP concerns to a shared
FeatureFetcher rather than managing its own OkHttp client and URI
construction. The synchronizer now calls fetcher.fetch() for the
raw JSON and handles only the FDv2 adaptation (ChangeSet conversion,
error classification, scheduling).

Made-with: Cursor
…mpty selector

The non-empty selector early return in runInitializers() bypassed the
fdv1Fallback check, so a polling initializer that returned a fully
current payload with x-ld-fd-fallback: true would silently proceed
to FDv2 synchronizers instead of switching to the FDv1 fallback.

Made-with: Cursor
…hronizer loop

The fdv1Fallback check required the synchronizer to still be running,
but TERMINAL_ERROR set running=false first, silently discarding the
fallback signal. Removed the running guard so a terminal error response
carrying x-ld-fd-fallback: true correctly transitions to the FDv1
synchronizer. Also added a test for the equivalent initializer path.

Made-with: Cursor
…chronizer

The synchronizer always owns its fetcher in production, and the test
mock's close() is already a no-op, so the flag was unnecessary.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@tanderson-ld tanderson-ld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, but posting current comments at my end of day. Will resume tomorrow.

… FDv1PollingSynchronizer

Process all results and errors directly inside the FeatureFetcher
callback so the future carries a fully-formed FDv2SourceResult.
This eliminates the need to unwrap ExecutionException to inspect
application-level error types, keeping error classification at the
callback layer where it belongs.

Made-with: Cursor
initialDelayMillis,
pollIntervalMillis,
TimeUnit.MILLISECONDS);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared single-thread executor risks deadlock with blocking poll

Medium Severity

FDv1PollingSynchronizer schedules its recurring poll task on inputs.getSharedExecutor() — the same 4-thread pool used by FDv2DataSource's main loop, condition timers, and other synchronizers. The doPoll() method blocks the executor thread via resultFuture.get() while waiting for the async HTTP callback. Combined with the main data source loop also occupying a thread on the same pool, this could lead to thread starvation or deadlock under load, especially during the transition period when both FDv2 and FDv1 components are active.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this problem is theoretical only. In practice, when the fallback synchronizer is in use, all the other initializers and synchronizers will be closed. In other words, only one synchronizer runs at a time.

Regarding the statement "... especially during the transition period when both FDv2 and FDv1 components are active": The old synchronizer is fully closed before the new one is even built. There's no window where both are running simultaneously. I'm not sure what "transition period" is referring to.

Copy link
Copy Markdown
Contributor

@tanderson-ld tanderson-ld Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe see if you can figure out why Cursor bugbot thinks and other synchronizers is possible at the same time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Cursor to analyze this statement a few different ways. In each case, it seems like Cursor generally knew how the shared executor worked because of the comments in that class's definition. But when it made this comment on my PR, it hadn't taken one step further to understand exactly how many threads were being used, and how the threads are being juggled.

After writing this, it seems like adding a little bit more to the comments may help in the future.

Move the FDv1 fallback check to a single location before the result
type switch statement. This ensures the fallback signal is honored
regardless of selector state or result type, eliminating duplicated
logic and a potential path where tryCompleteStart could be skipped.

Made-with: Cursor
*
* @return a FDv1 polling synchronizer builder
*/
public static FDv1PollingSynchronizerBuilder fdv1PollingSynchronizer() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want this to be public since fdv1 fallback is an LD internal concern. I don't think you necessarily have to put this in this class, but it should not be public if so.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that last push, I've moved makeDefaultModeTable() from DataSystemBuilder to DataSystemComponents. Then I updated the code so FDv1PollingSynchronizerBuilderImpl now implements DataSourceBuilder directly, instead of extending this FDv1PollingSynchronizerBuilder (which I deleted).

initialDelayMillis,
pollIntervalMillis,
TimeUnit.MILLISECONDS);
}
Copy link
Copy Markdown
Contributor

@tanderson-ld tanderson-ld Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe see if you can figure out why Cursor bugbot thinks and other synchronizers is possible at the same time.

if (result.isFdv1Fallback()
&& sourceManager.hasFDv1Fallback()
&& !sourceManager.isCurrentSynchronizerFDv1Fallback()) {
logger.info("Server signaled FDv1 fallback; switching to FDv1 polling synchronizer.");
Copy link
Copy Markdown
Contributor

@tanderson-ld tanderson-ld Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make private const for this message and commonize with other log location.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

…default mode table into DataSystemComponents

Move makeDefaultModeTable() from DataSystemBuilder into DataSystemComponents
so the FDv1 fallback synchronizer can be fully package-private. Remove the
public FDv1PollingSynchronizerBuilder and fdv1PollingSynchronizer() factory
method to prevent external customization of the fallback path.

Made-with: Cursor
…vives custom mode overrides

The FDv1 fallback synchronizer now uses the background poll interval
(60 min) when in background mode instead of always using the foreground
rate (5 min). Customer mode overrides preserve the default FDv1 fallback
when the custom mode has initializers or synchronizers. Consolidate
DEFAULT_POLL_INTERVAL_MILLIS into LDConfig as a single source of truth.

Made-with: Cursor
@tanderson-ld
Copy link
Copy Markdown
Contributor

Looks good to me, just that one bugbot null check needed and benchtesting results.

// TODO: Arrays.asList(cacheInitializer) — add once implemented
Collections.<DataSourceBuilder<Initializer>>emptyList(),
Collections.singletonList(pollingSynchronizer),
fdv1FallbackPollingSynchronizerForeground
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared FDv1 fallback builder instance across STREAMING and POLLING modes

Low Severity

The same fdv1FallbackPollingSynchronizerForeground builder instance is shared between the STREAMING and POLLING mode definitions. Since FDv1PollingSynchronizerBuilderImpl is mutable (it has a settable pollIntervalMillis), if any code path ever mutates this shared builder after table construction, it would affect both modes. While currently safe because the builder is not further modified, sharing a mutable builder across multiple ModeDefinition entries is fragile.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3afd09a. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also by design. I think the alternative is to create two separate FDv1PollingSycnhronizer builders (and i believe only their polling durations would be different).

@tanderson-ld tanderson-ld self-requested a review April 7, 2026 21:50
resultQueue.put(FDv2SourceResult.status(
FDv2SourceResult.Status.interrupted(
new LDFailure("Failed to parse SSE event", e,
LDFailure.FailureType.INVALID_RESPONSE_BODY))));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming ping handler loses fallback signal on poll failure

Low Severity

In handleMessage, the early return for ping events bypasses the new isFdv1Fallback(event.getHeaders()) check. The ping handler delegates to FDv2PollingBase.doPoll(), which reads the fallback header from the poll response. However, if the poll request fails with a transport-level exception (not an HTTP error), doPoll() hardcodes fdv1Fallback = false in its InterruptedException/ExecutionException catch blocks, losing the fallback signal that was present in the SSE connection's response headers carried by event.getHeaders(). Eventually self-corrects via subsequent events or connection errors, but creates a detection gap.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c6beb5c. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the actual "ping" response will contain the fallback header. But when the SDK receives a "ping" message, it immediately polls to get data (per the spec), and the response from the poll is inspected for a fallback header. I think that's good enough.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 16b9f8f. Configure here.

LDUtil.makeHttpProperties(inputs.getHttp()),
inputs.getCacheDir(),
inputs.getBaseLogger()
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FDv1 fallback polling uses wrong endpoint path

High Severity

The FDv1PollingSynchronizerBuilderImpl creates an HttpFeatureFlagFetcher using inputs.getServiceEndpoints().getPollingBaseUri() as the polling URI. However, getPollingBaseUri() returns the FDv2 polling base URI. The FDv1 fallback synchronizer needs to reach the legacy FDv1 /sdk/evalx endpoints, but HttpFeatureFlagFetcher appends FDv1 paths (StandardEndpoints.POLLING_REQUEST_GET_BASE_PATH) to whatever base URI it receives. If the FDv2 polling base URI differs from the FDv1 polling base URI (which is the case when custom service endpoints are configured separately for FDv2), the requests will be sent to the wrong host or path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 16b9f8f. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot is confused between the base URI and the full path. The pollingBaseUri is really just the base url, and it's the same between FDv1 and FDv2. Then the correct path is added to the baseUri when the fetch() function is called.

pollIntervalMillis,
TimeUnit.MILLISECONDS);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FDv1 synchronizer starts polling immediately in constructor

Medium Severity

FDv1PollingSynchronizer starts its scheduled polling task in the constructor with initialDelayMillis=0. The factory in FDv1PollingSynchronizerBuilderImpl.build() passes 0 for initialDelayMillis. Since SourceManager.getNextAvailableSynchronizerAndSetActive() calls factory.build() which constructs this synchronizer, polling begins immediately upon construction — potentially before the caller has called next() to consume results. If the first poll completes very quickly, the result goes into resultQueue which is fine, but if a terminal error occurs before next() is ever called, shutdownFuture is set and the scheduled task cancelled. Subsequent next() calls will return the terminal result, which is correct. However, this eager-start pattern means the synchronizer fires off network requests before the FDv2DataSource run loop is ready to consume them, which could cause unexpected behavior if the executor thread pool is constrained.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 16b9f8f. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was our design choice. The FDv2 polling synchronizer behaves the same way.

@aaron-zeisler
Copy link
Copy Markdown
Contributor Author

@tanderson-ld The rest of the open Bugbot issues are either not actual issues, or they're comments about things that we did by design. This looks good to me 👍

@aaron-zeisler aaron-zeisler merged commit b05e3db into main Apr 8, 2026
7 checks passed
@aaron-zeisler aaron-zeisler deleted the aaronz/SDK-1829/fdv2-to-fdv1-fallback-handling branch April 8, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants