HTTP/2 connection lifecycle: max lifetime with per-connection jitter + PING health probing by jeet1995 · Pull Request #48420 · Azure/azure-sdk-for-java

jeet1995 · 2026-03-14T01:50:06Z

HTTP connection lifecycle management: max lifetime + PING keepalive for Gateway V2

Summary

This PR adds two independent HTTP connection lifecycle features for Cosmos DB Gateway V2 (thin client) endpoints: max connection lifetime (forces periodic DNS re-resolution by evicting long-lived connections) and HTTP/2 PING keepalive (prevents L7 middleboxes from silently reaping idle connections). Both HTTP/1.1 and HTTP/2 coexist on the same account — Kusto telemetry shows 43.8M HTTP/2 vs 1.1M HTTP/1.1 requests in a 6-hour window — so max lifetime applies to both protocols while PING keepalive targets HTTP/2 only. All settings are internal system properties (no public API changes) with conservative defaults chosen for safe production rollout.

1. Purpose / Motivation

Problem	Impact
Stale DNS pinning — TCP connections never re-resolve DNS on their own	When Cosmos DB frontend federations scale out or fail over, traffic stays pinned to stale IPs indefinitely. New backends receive zero load until clients reconnect.
L7 idle connection reaping — NAT gateways, firewalls, and load balancers silently drop idle HTTP/2 connections	TCP keepalive operates at L4 and is invisible to L7 middleboxes. The client believes the connection is alive, sends a request, and gets a RST or hangs.

Solution: Two orthogonal features address these independently:

Max connection lifetime → forces DNS re-resolution by evicting connections after a configurable duration, regardless of health. Applies to both HTTP/1.1 and HTTP/2.
PING keepalive → sends periodic HTTP/2 PING frames to keep connections alive at L7 and detect degraded connections (ACK timeout → eviction). HTTP/2 only.

Both HTTP/1.1 and HTTP/2 coexist on the same Cosmos DB account (confirmed by Kusto evidence: 43.8M HTTP/2 and 1.1M HTTP/1.1 requests in a 6-hour window), so the implementation must handle both protocols.

2. Implementation Approach

Max Lifetime Eviction (HTTP/1.1 + HTTP/2)

Custom evictionPredicate in ConnectionProvider.Builder implements a 3-phase eviction order: dead channels → idle channels → lifetime-expired channels.

For HTTP/2, a two-phase lifetime eviction avoids sending RST_STREAM on active streams:

Mark pending — set PENDING_EVICTION_NANOS attribute on the channel
Drain grace period (10 seconds) — allow in-flight streams to complete
Evict — close the connection

Per-connection subtractive jitter [base - 30s, base] prevents thundering-herd reconnection storms. Each connection independently computes its expiry time at creation, so connections opened at the same time expire at different times.

PING Keepalive (HTTP/2 only)

Uses a custom Http2PingHandler (ChannelDuplexHandler) installed on HTTP/2 parent channels via doOnConnected. The handler:

Sends PING frames at configurable intervals (default 30s) when the connection has been idle
Tracks the last activity time (any read/write) to avoid sending PINGs on active connections
Checks Configs.isHttp2PingHealthEnabled() dynamically, allowing runtime toggle without client restart
Installation is best-effort — any ChannelPipelineException (e.g., duplicate handler from Netty regressions) is swallowed

Why not native reactor-netty pingAckTimeout? reactor-netty 1.2.13 bypasses its built-in maxIdleTime handling when a custom evictionPredicate is configured. Since we must use a custom eviction predicate for max-lifetime, the native PING path is never triggered. The custom Http2PingHandler is independent of the eviction predicate and works correctly alongside it.

Eviction Rate Limiter

At most 1 connection evicted per sweep cycle (dead channels are exempt from this limit). This is a defense-in-depth measure alongside jitter to prevent mass eviction.

Sweep Interval

Dynamically derived: clamp(min(idleTimeout, baseMaxLifetime) / 2, 1s, 5s)

Dynamic Runtime Toggle

Both features check their enable flag at runtime (not at client construction time):

Eviction predicate: Always installed, checks Configs.isHttpConnectionMaxLifetimeEnabled() on every sweep — returns false for lifetime-expired connections when disabled
Http2PingHandler: Always installed, checks Configs.isHttp2PingHealthEnabled() in maybeSendPing() — skips PING when disabled

This allows safe production rollout: flip system properties to disable either feature without restarting the application.

IHttpClientInterceptor Pattern

Test-time injection of AddressResolverGroup and doOnConnected callbacks uses the IHttpClientInterceptor interface (following the pattern from PR #47231). Netty-specific types stay off the public ConnectionPolicy class — the interceptor is wired through ImplementationBridgeHelpers and is null in production (zero overhead).

Configuration

All settings are internal system properties (not public API):

System Property	Default	Description
`COSMOS.HTTP_CONNECTION_MAX_LIFETIME_ENABLED`	`true`	Enable/disable max lifetime eviction
`COSMOS.HTTP_CONNECTION_MAX_LIFETIME_IN_SECONDS`	`1800` (30 min)	Base max connection lifetime
`COSMOS.HTTP2_PING_HEALTH_ENABLED`	`true`	Enable/disable PING keepalive
`COSMOS.HTTP2_PING_INTERVAL_IN_SECONDS`	`30`	Interval between PING frames

The default 30 min max lifetime is deliberately conservative compared to .NET's 5 min — we will tune after production validation.

3. Key Files Changed

File	Change
`HttpClient.java`	Eviction predicate with 3-phase logic, rate limiter, sweep interval derivation, dynamic toggle
`HttpConnectionLifecycleUtil.java`	NEW — channel attribute stamping (`CONNECTION_EXPIRY_NANOS`, `PENDING_EVICTION_NANOS`)
`Http2PingHandler.java`	NEW — custom HTTP/2 PING keepalive handler with dynamic Configs check
`ReactorNettyClient.java`	`doOnConnected` expiry stamping + PING handler install, resolver group support
`Configs.java`	System properties for max lifetime and PING settings
`IHttpClientInterceptor.java`	NEW — test-time injection interface for AddressResolverGroup + doOnConnected
`CosmosInterceptorHelper.java`	NEW — helper to register interceptors via ImplementationBridgeHelpers
`CosmosClientBuilder.java`	IHttpClientInterceptor accessor via bridge helpers
`ConnectionPolicy.java`	IHttpClientInterceptor propagation (no Netty types on public API)
`RxDocumentClientImpl.java`	Wires interceptor from `ConnectionPolicy` → `HttpClientConfig`
`NetworkFaultInjector.java`	NEW — shared utility for `tc netem` / `iptables` fault injection in tests
`FilterableDnsResolverGroup.java`	NEW — test fixture for DNS-level IP filtering
`Http2ConnectionLifecycleTests.java`	9 tests covering lifecycle scenarios
`Http2ConnectTimeoutBifurcationTests.java`	5 tests covering connection timeout survival
`tests.yml`	CI pipeline stage for network fault tests
`HTTP_CONNECTION_LIFECYCLE_SPEC.md`	Design specification

4. Benchmark Results

Test matrix: {c10, c2} × {ReadThroughput, WriteThroughput} + {c1 sparse} × {ReadThroughput, WriteThroughput}, GATEWAY mode, 2h per scenario (721 × 10s-interval samples), 30 min per sparse scenario. All endpoints route through Azure Traffic Manager (Central US region, 4 backend IPs). Both main and dev branch tested on the same VM sequentially.

Infrastructure: Standard_D2s_v3 (2 vCPU, 8 GB, Central US), JDK 21, Maven 3.8.7. Account abhm-cfp-region-test (3 regions: East US, West US, Central US), container at autoscale 100K RU/s. Dev branch config: maxLifetime=300s (5 min), pingInterval=30s.

Throughput & Latency

Config	Concurrency	Operation	main (ops/s)	dev (ops/s)	Δ ops	main mean (ms)	dev mean (ms)
Read	c10	ReadThroughput	2,128	2,046	-3.8%	4.3	4.4
Write	c10	WriteThroughput	260	276	+5.9%	38.0	35.4
Read	c2	ReadThroughput	641	605	-5.6%	3.0	3.0
Write	c2	WriteThroughput	58	54	-6.4%	33.7	37.9
Read	c1 sparse	ReadThroughput	0.20	0.20	0%	—	—
Write	c1 sparse	WriteThroughput	0.20	0.20	0%	—	—

Note on c10 reads: Both main and dev are CPU-saturated at 92–93% on this 2-vCPU VM. The -3.8% delta is within noise for CPU-bound workloads.

Note on c2: At low concurrency, each connection rotation (every 5 min) has proportionally larger impact — the TLS handshake + HTTP/2 setup overhead is amortized over fewer concurrent streams. This is the expected cost of DNS re-resolution.

Federation Distribution (ComputeRequest5M Kusto Validation)

Kusto confirms the key finding: main pins all traffic to 1 federation; dev with maxLifetime distributes across 4 federations.

Time Window	Scenario	fe43	fe11	fe39	fe38	Federations
00:00–02:00	main (baseline)	100% (14.8M)	0%	0%	0%	1
02:00–02:30	dev (maxLife=5m)	30%	51%	19%	0%	3
02:30–03:00	dev	30%	37%	17%	16%	4
03:00–03:30	dev	31%	36%	23%	10%	4
03:30–04:00	dev	34%	11%	22%	34%	4

IP Rotation Validation

IpRotationHarness on VM (Central US, maxLife=60s, 15 min runtime, 3 phases with FilterableDnsResolverGroup IP blocking):

Phase	Duration	IPs Seen	Blocked IP Traffic	Result
1: Normal	5 min	4 CUS IPs	N/A	✅ Load balanced across 4 IPs via DNS rotation
2: Block .38	5 min	3 CUS IPs	0 connections (0%)	✅ Traffic fully shifted away from blocked IP
3: Unblock	5 min	2 new IPs	N/A	✅ Traffic rebalanced to fresh DNS-resolved IPs

59 connection rotations in Phase 1 alone (maxLife=60s + jitter). Each rotation forced a DNS re-resolve, discovering different backend IPs behind Azure Traffic Manager.

DNS Behavior Validation

DefaultAddressResolverGroup delegates to InetAddress.getByName() with no additional cache — only the JVM DNS cache (30s TTL) sits between the SDK and Azure Traffic Manager. When ATM removes a dead federation, new connections get healthy IPs within ~30 seconds.

Conclusion

No regression at high concurrency (c10): Reads within noise (CPU-saturated), writes improved +5.9% likely due to better federation load balancing from connection rotation
Small overhead at low concurrency (c2): 5–6% throughput delta is the expected cost of rotating connections every 5 min — acceptable trade-off for DNS re-resolution benefits
Zero-error sparse workload: MaxLifetime + PING produce no errors under sparse traffic (concurrency=1, 5s between ops)
Federation distribution: main pins 100% traffic to 1 federation; dev distributes across 4 federations — confirming max-lifetime achieves its core design goal
DNS resolution chain validates: DefaultAddressResolverGroup → JVM cache (30s) → Azure Traffic Manager (10–14s TTL) — no additional cache layers to defeat ATM health-based routing

5. Testing Methodology

Tests use real network fault injection (not SDK synthetic faults) via tc netem and iptables on Linux VMs. Shared NetworkFaultInjector utility handles sudo detection, tc netem delay, iptables drop, and cleanup.

Connection Timeout Survival (tc netem — 5 tests)

Test	What it proves
`connectionReuseAfterRealNettyTimeout`	Parent TCP connection survives a stream-level `ReadTimeoutException`
`multiParentChannelConnectionReuse`	All parent channels survive under concurrent load
`retryUsesConsistentParentChannelId`	Retry attempts are tracked across gateway stats
`connectionSurvivesE2ETimeoutWithRealDelay`	End-to-end cancel doesn't close the parent connection
`parentChannelSurvivesE2ECancelWithoutReadTimeout`	3s e2e cancel before 6s `ReadTimeout` doesn't kill parent

Max Lifetime Eviction (3 tests)

Test	What it proves
`connectionRotatedAfterMaxLifetimeExpiry`	Connection is evicted after lifetime + jitter expires
`perConnectionJitterStaggersEviction`	Connections don't all expire in the same sweep cycle
`connectionEvictedAfterMaxLifetimeEvenWithHealthyPings`	Lifetime eviction works even when PINGs are healthy

PING Health (1 test)

Test	What it proves
`degradedConnectionEvictedByPingHealthCheck`	`iptables` blackhole → PING ACK timeout → connection evicted

DNS Rotation (1 test)

Test	What it proves
`dnsRotationAfterMaxLifetimeExpiry`	`FilterableDnsResolverGroup` blocks IP1; max lifetime eviction forces DNS re-resolution to IP2

CI Integration

New Cosmos_Live_Test_HttpNetworkFault stage in tests.yml:

Ubuntu VMs with tc/iptables prerequisites
MaxParallel=1 (network faults are host-global)
Thin client test account

6. .NET Parity

Aspect	.NET	Java (this PR)
Base lifetime	5 min	30 min (defensive — tune after validation)
Jitter	Per-pool `[0s, 30s)`	Per-connection `[0s, 30s]` (subtractive)
PING keepalive	No	Yes (custom `Http2PingHandler`)
PING-based eviction	No	Yes (ACK timeout → connection close)
HTTP/2 fallback	Explicit error path	Graceful ALPN negotiation (reactor-netty auto-configures)

7. Configuration Quick Reference

Enable/disable max lifetime:

-DCOSMOS.HTTP_CONNECTION_MAX_LIFETIME_ENABLED=false   # disable (default: true)
-DCOSMOS.HTTP_CONNECTION_MAX_LIFETIME_IN_SECONDS=900   # override to 15 min (default: 1800)

Enable/disable PING keepalive:

-DCOSMOS.HTTP2_PING_HEALTH_ENABLED=false               # disable (default: true)
-DCOSMOS.HTTP2_PING_INTERVAL_IN_SECONDS=30              # override interval (default: 30)

To fully disable both features, set both ENABLED properties to false.

8. Future Work

reactor-netty 1.3.4: Replace custom lifetime logic with native maxLifeTime() + maxLifeTimeVariance() once we upgrade
HTTP/1.1 application-layer keepalive: No PING equivalent exists for HTTP/1.1 — investigate OPTIONS or HEAD probes
PING tuning: Adjust interval and ACK timeout after production validation with real middlebox behavior data

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message.

Testing Guidelines

Pull request includes test coverage for the included changes.

jeet1995 · 2026-03-14T02:09:20Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-14T02:09:44Z

Azure Pipelines successfully started running 1 pipeline(s).

jeet1995 · 2026-03-14T13:05:28Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-14T13:05:58Z

Azure Pipelines successfully started running 1 pipeline(s).

jeet1995 · 2026-03-14T16:49:42Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-14T16:50:07Z

Azure Pipelines successfully started running 1 pipeline(s).

jeet1995 · 2026-03-14T17:55:56Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-14T17:56:20Z

Azure Pipelines successfully started running 1 pipeline(s).

jeet1995 · 2026-03-14T22:14:58Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-14T22:15:25Z

Azure Pipelines successfully started running 1 pipeline(s).

jeet1995 · 2026-03-14T22:33:16Z

/azp run java - cosmos - tests

azure-pipelines · 2026-03-14T22:33:43Z

Azure Pipelines successfully started running 1 pipeline(s).

…r, design spec - Switch from per-evaluation to per-connection jitter via CONNECTION_EXPIRY_NANOS channel attribute - Make pingContent a static final constant (PING_CONTENT) - Derive sweep interval from min(thresholds)/2 clamped to [1s, 5s] - Add eviction rate limiter: max 1 eviction per sweep cycle (dead channels exempt) - Add HTTP_CONNECTION_LIFECYCLE_SPEC.md design spec for review Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

9.1: Split installOnParentIfAbsent into stampConnectionExpiry + installOnParentIfAbsent. Max lifetime works independently of PING — disabling PING no longer silently disables max lifetime. 9.2: Two-phase eviction for Phase 3 (lifetime) via PENDING_EVICTION_NANOS attribute. First sweep marks connection as pending. Subsequent sweeps evict when idle or after 10s drain grace period. Prevents RST_STREAM on active H2 streams during routine lifetime rotation. Phase 2 (PING-stale) stays immediate — degraded connections should be evicted fast. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jeet1995 · 2026-03-23T21:15:19Z

sdk/cosmos/azure-cosmos/docs/HTTP_CONNECTION_LIFECYCLE_SPEC.md

+HTTP/2 connections can become silently degraded — packet black-holes, half-open TCP,
+NAT/firewall timeout — without the SDK knowing. In sparse workloads, two problems arise:
+
+1. **Silent degradation detection**: The next request discovers the dead connection via response


Silent degradation detection affects both sparse and non-sparse workloads.

jeet1995 · 2026-03-23T21:16:06Z

sdk/cosmos/azure-cosmos/docs/HTTP_CONNECTION_LIFECYCLE_SPEC.md

+   eviction predicate is invoked."* Since we need a custom predicate for PING health, the
+   built-in `maxLifeTime` and `maxIdleTime` handling is replaced entirely.
+
+reactor-netty 1.3.4 introduces `maxLifeTimeVariance(double)` for per-connection jitter — exactly


Add an item to track maxLifeTimeVariance integration too.

jeet1995 · 2026-03-23T21:16:32Z

sdk/cosmos/azure-cosmos/docs/HTTP_CONNECTION_LIFECYCLE_SPEC.md

+┌──────────────────────────────────────────────────────────────────────┐
+│                 ConnectionProvider (reactor-netty 1.2.13)            │
+│                                                                      │
+│  evictInBackground(5s) sweeps all connections through:               │


Ensure the overview is up to date w.r.t rest of spec (section 9.1 and 9.2 changes should be reflected here). Design choices should precede the overview.

jeet1995 · 2026-03-23T21:17:18Z

sdk/cosmos/azure-cosmos/docs/HTTP_CONNECTION_LIFECYCLE_SPEC.md

+
+---
+
+## 3. Eviction Predicate Design


Update section 3 with section 9.1 and 9.2 changes.

- Goal 2: Silent degradation affects all workloads, not just sparse - Add maxLifeTimeVariance tracking item in motivation section - New §2 Key Design Choices precedes architectural overview - §3 Overview updated to reflect decoupled install paths and two-phase eviction - §4 Phase 2/3 updated with immediate vs two-phase eviction details - Section numbers renumbered (§2-§10) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Change max lifetime default from 300s (5min) to 1800s (30min) — defensive Effective range with jitter: [30:01, 30:30] - Add COSMOS.HTTP_CONNECTION_MAX_LIFETIME_ENABLED (default: true) - Add COSMOS.HTTP2_PING_HEALTH_ENABLED (default: true) - Both features now have explicit boolean toggles alongside numeric configs - Update SPEC config table with new defaults and toggle flags Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove Phase 2 (PING ACK stale → evict) from eviction predicate - PING handler remains for keepalive (prevents NAT/firewall idle reaping) - Degraded connections handled by response timeout retry path - Rewrite SPEC: decision-focused, ~150 lines, no code duplication - Add TCP keepalive vs HTTP/2 PING distinction Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Move stampConnectionExpiry + PING install to shared doOnConnected (all connections) - H2-specific doOnConnected now only handles header cleaner - Wire AddressResolverGroup injection via HttpClientConfig for e2e tests - SPEC updated: both goals apply to all connections, architecture diagram updated Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

HTTP/1.1 has no PING equivalent — L7 middleboxes can't see TCP keepalive. ChangeFeed (100% of H1.1 traffic) is long-polling so rarely idle. Low risk today but worth addressing if future H1.1 workloads emerge. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PING is an HTTP/2 protocol frame — cannot be sent on H1.1 connections. Code already correct (isH2Enabled guard). SPEC now consistent: - Goal 2: Connection keepalive (HTTP/2) - Design Choice 3: PING keepalive is HTTP/2 only - Architecture: PING install gated on H2 enabled - Design choices renumbered (1-9, no duplicates) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

FabianMeiswinkel

LGTM

xinlian12 · 2026-03-25T17:04:59Z

sdk/cosmos/azure-cosmos/docs/HTTP_CONNECTION_LIFECYCLE_SPEC.md

+│  │
+│  └─ If PING keepalive enabled AND H2 enabled:
+│       installOnParentIfAbsent(channel, interval)
+│       → installs Http2PingHealthHandler (H2 only — PING is an HTTP/2 frame)


why not use Netty native support ping mechanism?

.http2Settings(settings -> settings .pingAckTimeout(Duration.ofSeconds(10)) .pingAckDropThreshold(3)

I think my initial design was if pings are not responded to then to also evict the channel (which requires a custom ChannelDuplexHandler). Since pings are purely for extending idleness, I could use this.

.pingAckDropThreshold(3) -> this will cause the connection to drop?

and also I feel if service side support http2 ping, we probably should enable it by default - helps with the timeout detection part as well

xinlian12 · 2026-03-25T17:58:01Z

sdk/cosmos/azure-cosmos/docs/HTTP_CONNECTION_LIFECYCLE_SPEC.md

+   (all connections expire together) and the non-determinism of re-rolling jitter each sweep.
+   Matches reactor-netty 1.3.4's `maxLifeTimeVariance` semantics for easy migration.
+
+6. **Two-phase eviction for lifetime** — Instead of immediately closing a connection past


just thinking loud-> with the jitter in place, do we still need the rate limiting? jitter should already helped that the connections will not be closed all at the same time.

This is me being defensive but valid point. I feel we can make a test-driven decision.

xinlian12 · 2026-03-25T17:59:39Z

sdk/cosmos/azure-cosmos/docs/HTTP_CONNECTION_LIFECYCLE_SPEC.md

+   Always faster than the smallest eviction threshold.
+
+9. **30-minute default (defensive)** — .NET uses 5 minutes. We start at 30 minutes with
+   `[30:01, 30:30]` effective range. Can be tuned down after production validation.


since the config here will be maxLifeTime, maybe we should [29:30, 30:0] etc

…tionMaxLife

…r, Java 8 compat - Switch PING keepalive from custom ChannelHandler to reactor-netty native pingAckTimeout/pingAckDropThreshold (available since 1.2.12). Simplifies code and enables dead connection detection for half-open TCP. - Fix jitter direction: subtract from base lifetime (effective [29:30, 30:00]) to match reactor-netty 1.3.4 maxLifeTimeVariance semantics. maxLifeTime is now the upper bound, never exceeded. - Replace Http2PingHealthHandler with HttpConnectionLifecycleUtil (utility class for channel attributes and connection expiry stamping). - Fix Set.copyOf() -> Collections.unmodifiableSet() for Java 8 compatibility. - Update spec: rate limiter rationale, native PING design, .NET parity table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…onMaxLife

Wire AddressResolverGroup through ConnectionPolicy → RxDocumentClientImpl → HttpClientConfig so tests can inject a custom DNS resolver via the CosmosClientBuilderAccessor bridge pattern. New test validates the full chain: max lifetime expiry → eviction → pool creates new connection → FilterableDnsResolverGroup re-resolves to a different backend IP (IP1 blocked) → traffic moves to IP2. Production changes: - ConnectionPolicy: add addressResolverGroup field + getter/setter - RxDocumentClientImpl.httpClient(): propagate resolver to HttpClientConfig - CosmosClientBuilder: add field, wire in buildConnectionPolicy() - ImplementationBridgeHelpers: add setAddressResolverGroup to accessor Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add Features Added entries under 4.80.0-beta.1 (Unreleased) for: - HTTP connection max lifetime with per-connection jitter for DNS re-resolution - HTTP/2 PING keepalive via native reactor-netty pingAckTimeout/pingAckDropThreshold Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace native reactor-netty pingAckTimeout (incompatible with custom evictionPredicate) with a manual Http2PingHandler ChannelDuplexHandler installed on the parent H2 channel. The handler: - Tracks last read/write activity on the parent channel - Schedules PING frames when idle > configured interval (default 10s) - Counts PINGs sent and ACKs received (for observability/testing) - Does NOT close the connection on missed ACKs (keepalive only) - Detected via Http2MultiplexHandler in pipeline (not channel.parent()) Key finding: reactor-netty's first doOnConnected fires for the parent TCP channel (parent()==null), not stream channels. H2 parent detection uses Http2MultiplexHandler presence in the pipeline. Removed degradedConnectionEvictedByPingHealthCheck test — PING is keepalive-only, not eviction. Degraded connections handled by response timeout retry path (6s/6s/10s escalation -> cross-region failover). Test: pingFramesSentAndAcknowledgedOnIdleConnection - Installs Http2PingHandler via doOnConnectedCallback on H2 parent - Configures 3s PING interval, waits 20s idle - Asserts pingsSent > 0 (proven: pingsSent=5, pingAcksReceived=10) - Asserts connection survived (same parentChannelId) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jeet1995 · 2026-04-03T23:11:22Z

/azp run java - cosmos - tests

azure-pipelines · 2026-04-03T23:11:47Z

Azure Pipelines successfully started running 1 pipeline(s).

The prio qdisc default priomap routes packets by TOS bits to bands BEFORE tc filters are consulted. Without an explicit priomap, non-SYN data packets could be routed to the delayed bands (1:1 or 1:2) instead of the no-delay band (1:3), causing metadata fetch 503 failures. Fix: set priomap to '2 2 2 ... 2' (all 16 entries point to band 3) so ALL traffic defaults to no-delay. Only explicitly marked SYN packets (via iptables mangle MARK) are routed to delay bands by the tc filters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jeet1995 · 2026-04-03T23:56:51Z

/azp run java - cosmos - tests

azure-pipelines · 2026-04-03T23:57:15Z

Azure Pipelines successfully started running 1 pipeline(s).

…into AzCosmos_HttpConnectionMaxLife

@afterclass

- Fix 3 source files with incorrect 'native pingAckTimeout' comments (HttpClient.java, HttpConnectionLifecycleUtil.java, ReactorNettyClient.java) to reflect actual custom Http2PingHandler implementation - Replace 13+ inline fully qualified class names with imports (ReactorNettyClient.java, Http2ConnectionLifecycleTests.java) - Hardcode TestNG group string in both test files, remove TEST_GROUP static var - Add clearAllCosmosSystemProperties() helper for wide cleanup in @AfterMethod/@afterclass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace raw AddressResolverGroup/doOnConnectedCallback fields threaded through CosmosClientBuilder -> ConnectionPolicy -> HttpClientConfig with a single IHttpClientInterceptor interface following the pattern from PR Azure#47231. Production (azure-cosmos): - IHttpClientInterceptor: minimal interface with getAddressResolverGroup() and getDoOnConnectedCallback(), null-safe in production - ConnectionPolicy: no longer exposes Netty types (AddressResolverGroup removed) - CosmosClientBuilder: holds IHttpClientInterceptor instead of raw Netty fields Test (azure-cosmos-test): - CosmosHttpClientInterceptor: concrete implementation - CosmosInterceptorHelper.registerHttpClientInterceptor(): convenience API consistent with existing registerTransportClientInterceptor() Tests updated to use CosmosInterceptorHelper instead of bridge helpers directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Extract duplicated tc netem and iptables helpers into a reusable NetworkFaultInjector utility class. Consolidates: - sudo/root detection - network interface discovery - addNetworkDelay(delayMs), removeNetworkDelay() - addPacketDrop(port), removePacketDrop(port) - removeAll() for wide cleanup Http2ConnectionLifecycleTests refactored to use NetworkFaultInjector. Http2ConnectTimeoutBifurcationTests can follow in a subsequent commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tom PING - Http2ConnectTimeoutBifurcationTests: use NetworkFaultInjector for sudo detection, iptables helpers, and cleanup. Remove duplicated methods. Per-port delay methods (addPerPortDelay, addPerPortSynDelay) kept locally as they are bifurcation-test-specific. - SPEC: Fix Design Choices #3 and #4 to reflect custom Http2PingHandler (not native pingAckTimeout). Fix Architecture diagram, Config table, Non-Goals, and .NET Parity sections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Root cause: 8s tc netem delay was less than the e2e timeout (15s/25s), so requests completed slowly but successfully instead of timing out. Fixes: - Increase tc netem delay from 8s to 20s (exceeds e2e timeout) - Add 1s settling delay in NetworkFaultInjector.addNetworkDelay() to ensure qdisc is active before first packet enters the queue - Accept both 408/10002 (ReadTimeout) and 408/20008 (e2e cancel) in assertContainsGatewayTimeout — both prove the delay caused failure - Relax retryUsesConsistentParentChannelId to accept >=1 attempt (20s delay leaves only 5s of 25s e2e budget — insufficient for retry) Remaining: multiParentChannelConnectionReuse gets transient 500 from the thin-client proxy under 100-concurrent-request burst — server-side. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

HttpClient.java: - Always install eviction predicate + evictInBackground (no longer gated by maxLifetimeSeconds > 0). Predicate dynamically checks Configs.isHttpConnectionMaxLifetimeEnabled() for Phase 2 (lifetime). Toggling the flag at runtime disables lifetime eviction without restart; dead + idle eviction continue to work. Http2PingHandler.java: - Add dynamic Configs.isHttp2PingHealthEnabled() check in maybeSendPing(). Toggling the flag at runtime stops PINGs on existing connections. - Make HANDLER_NAME private (only used internally) - Remove unnecessary volatile from lastActivityNanos (event-loop-bound) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Move FilterableDnsResolverGroup to azure-cosmos-test module (com.azure.cosmos.test.faultinjection package) for reuse - Add azure-cosmos-test dependency to azure-cosmos-benchmark pom - Add dnsBlockingEnabled + dnsBlockingCycleMinutes config to TenantWorkloadConfig (JSON-driven, tenantDefaults supported) - Wire into AsyncBenchmark: inject FilterableDnsResolverGroup via CosmosInterceptorHelper, start background scheduler that cycles NORMAL -> BLOCKED -> NORMAL on configurable interval - Add IpRotationHarness test for standalone DNS rotation validation - Update test imports for new FilterableDnsResolverGroup package Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…CycleMinutes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fix CHANGELOG: describe custom Http2PingHandler instead of native pingAckTimeout - Add jitter > lifetime guard in HttpConnectionLifecycleUtil to prevent connection storms - Remove stale HTTP2_PING_ACK_TIMEOUT_IN_SECONDS test property (dead code from earlier design) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jeet1995 added 2 commits March 13, 2026 21:23

HTTP/2 PING health check + connection max lifetime eviction with jitter

bd25771

CI pipeline + README updates for HTTP network fault tests

5717ea6

github-actions bot added the Cosmos label Mar 14, 2026

Adding HTTP/2 ping and HTTP connection lifecycle capabilities.

d039814

Adding HTTP/2 ping and HTTP connection lifecycle capabilities.

4974ddf

Adding HTTP/2 ping and HTTP connection lifecycle capabilities.

2ff4f43

Adding HTTP/2 ping and HTTP connection lifecycle capabilities.

ed2264b

Adding HTTP/2 ping and HTTP connection lifecycle capabilities.

283b12f

jeet1995 changed the title ~~Az cosmos http connection max life~~ HTTP/2 connection lifecycle: max lifetime with per-connection jitter + PING health probing Mar 23, 2026

jeet1995 commented Mar 23, 2026

View reviewed changes

jeet1995 and others added 3 commits March 23, 2026 17:22

jeet1995 and others added 3 commits March 23, 2026 18:30

FabianMeiswinkel approved these changes Mar 24, 2026

View reviewed changes

xinlian12 reviewed Mar 25, 2026

View reviewed changes

jeet1995 and others added 6 commits March 31, 2026 16:47

Merge remote-tracking branch 'upstream/main' into AzCosmos_HttpConnec…

a9fd95c

…tionMaxLife

Merge remote-tracking branch 'origin/main' into AzCosmos_HttpConnecti…

8af167a

…onMaxLife

jeet1995 force-pushed the AzCosmos_HttpConnectionMaxLife branch from e308621 to 7d3ec81 Compare April 3, 2026 22:48

jeet1995 and others added 11 commits April 9, 2026 11:57

Merge branch 'main' of https://github.com/jeet1995/azure-sdk-for-java …

0de812e

…into AzCosmos_HttpConnectionMaxLife

Merge remote-tracking branch 'origin/AzCosmos_HttpConnectionMaxLife' …

e1cfdb3

…into AzCosmos_HttpConnectionMaxLife

Fix DNS blocking config: add applyField cases for dnsBlockingEnabled/…

95e71f1

…CycleMinutes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Conversation

jeet1995 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

HTTP connection lifecycle management: max lifetime + PING keepalive for Gateway V2

Summary

1. Purpose / Motivation

2. Implementation Approach

Max Lifetime Eviction (HTTP/1.1 + HTTP/2)

PING Keepalive (HTTP/2 only)

Eviction Rate Limiter

Sweep Interval

Dynamic Runtime Toggle

IHttpClientInterceptor Pattern

Configuration

3. Key Files Changed

4. Benchmark Results

Throughput & Latency

Federation Distribution (ComputeRequest5M Kusto Validation)

IP Rotation Validation

DNS Behavior Validation

Conclusion

5. Testing Methodology

Connection Timeout Survival (tc netem — 5 tests)

Max Lifetime Eviction (3 tests)

PING Health (1 test)

DNS Rotation (1 test)

CI Integration

6. .NET Parity

7. Configuration Quick Reference

8. Future Work

All SDK Contribution checklist:

Uh oh!

jeet1995 commented Mar 14, 2026

Uh oh!

azure-pipelines bot commented Mar 14, 2026

Uh oh!

jeet1995 commented Mar 14, 2026

Uh oh!

azure-pipelines bot commented Mar 14, 2026

Uh oh!

jeet1995 commented Mar 14, 2026

Uh oh!

azure-pipelines bot commented Mar 14, 2026

Uh oh!

jeet1995 commented Mar 14, 2026

Uh oh!

azure-pipelines bot commented Mar 14, 2026

Uh oh!

jeet1995 commented Mar 14, 2026

Uh oh!

azure-pipelines bot commented Mar 14, 2026

Uh oh!

jeet1995 commented Mar 14, 2026

Uh oh!

azure-pipelines bot commented Mar 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeet1995 Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeet1995 Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinlian12 Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jeet1995 commented Mar 14, 2026 •

edited

Loading

jeet1995 Mar 23, 2026 •

edited

Loading

jeet1995 Mar 25, 2026 •

edited

Loading

xinlian12 Mar 25, 2026 •

edited

Loading

xinlian12 Mar 25, 2026 •

edited

Loading