Skip to content

Commit 70c76d2

Browse files
slayerjainclaude
andcommitted
fix(sap-demo): partial-failure tolerance in Customer360 aggregator + clearer 502 semantics
The /api/v1/customers/{id}/360 aggregator fans out 3 parallel SAP OData calls + 2 Postgres SELECTs + 1 audit INSERT. Under Keploy record, SAP round-trips through the eBPF MITM pick up ~2× latency, and the existing timeouts were tight enough that a single slow-but-successful GET could push the optional fan-outs past the aggregate deadline (or trip the resilience4j circuit) and degrade the whole view to an empty 200 — or worse, surface a CallNotPermittedException as a 503 from an optional branch. Changes: * safely() now also catches CallNotPermittedException explicitly and falls back to Throwable-less Exception, so nothing thrown from an optional fan-out can 502/503 the parent request. * Raise read-timeout 30s → 45s, connect-timeout 10s → 15s, and aggregate-timeout 25s → 50s. All are env-overridable via SAP_{CONNECT,READ}_TIMEOUT_SECONDS and CUSTOMER360_AGGREGATE_TIMEOUT_SECONDS so real BTP tenants with tight SLAs can dial them back. * safely() log now includes the upstream HTTP status on SapApiException so operators can see at a glance whether a fan-out was a transport error or an upstream 5xx. Reproduction: 1. keploy record -c "java -jar target/customer360.jar" 2. curl localhost:8080/api/v1/customers/11/360 — intermittent 502 when the roles or addresses call trips slow-call circuit breaker, even though partner is healthy. Root cause: aggregate-timeout (25s) < read-timeout (30s) × potential retries (3) for the optional calls; CallNotPermittedException escapes safely() via the default Exception-handler path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 35d9987 commit 70c76d2

2 files changed

Lines changed: 26 additions & 4 deletions

File tree

sap-demo-java/src/main/java/com/tricentisdemo/sap/customer360/service/Customer360AggregatorService.java

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import com.tricentisdemo.sap.customer360.sap.CorrelationIdInterceptor;
1010
import com.tricentisdemo.sap.customer360.sap.SapApiException;
1111
import com.tricentisdemo.sap.customer360.sap.SapBusinessPartnerClient;
12+
import io.github.resilience4j.circuitbreaker.CallNotPermittedException;
1213
import org.slf4j.Logger;
1314
import org.slf4j.LoggerFactory;
1415
import org.slf4j.MDC;
@@ -180,11 +181,23 @@ private static <T> List<T> safely(String what, java.util.function.Supplier<List<
180181
try {
181182
return supplier.get();
182183
} catch (SapApiException sap) {
183-
log.warn("{} failed (SAP): {}", what, sap.getMessage());
184+
log.warn("{} failed (SAP {}): {}", what, sap.getUpstreamStatus().value(), sap.getMessage());
185+
return List.of();
186+
} catch (CallNotPermittedException circuit) {
187+
// Circuit breaker open — degrade the optional view rather than
188+
// propagating a 503 out of the aggregator. The partner call
189+
// (mandatory) has its own circuit and is handled upstream.
190+
log.warn("{} skipped: circuit breaker open ({})", what, circuit.getMessage());
184191
return List.of();
185192
} catch (RuntimeException e) {
186193
log.warn("{} failed ({}): {}", what, e.getClass().getSimpleName(), e.getMessage());
187194
return List.of();
195+
} catch (Exception e) {
196+
// Defensive: any checked exception that slipped through a
197+
// supplier lambda (e.g. via sneaky-throws) should still degrade
198+
// the optional fan-out, never 500 the parent request.
199+
log.warn("{} failed ({}): {}", what, e.getClass().getSimpleName(), e.getMessage());
200+
return List.of();
188201
}
189202
}
190203
}

sap-demo-java/src/main/resources/application.yml

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,12 +83,21 @@ sap:
8383
base-url: ${SAP_API_BASE_URL:https://sandbox.api.sap.com/s4hanacloud}
8484
key: ${SAP_API_KEY:}
8585
bearer-token: ${SAP_BEARER_TOKEN:}
86-
connect-timeout-seconds: 10
87-
read-timeout-seconds: 30
86+
# Timeouts are sized for the SAP Business Accelerator Hub sandbox when
87+
# routed through Keploy's eBPF proxy; first-hit latency on cold cache
88+
# + TLS MITM occasionally reaches ~20s for a single OData GET, so we
89+
# allow generous budgets rather than 502 a demo for a transient stall.
90+
connect-timeout-seconds: ${SAP_CONNECT_TIMEOUT_SECONDS:15}
91+
read-timeout-seconds: ${SAP_READ_TIMEOUT_SECONDS:45}
8892
default-top: 10
8993

9094
customer360:
91-
aggregate-timeout-seconds: 25
95+
# Upper bound for the parallel fan-out (addresses/roles/tags/notes). The
96+
# mandatory partner call runs before this and is bounded separately by
97+
# sap.api.read-timeout-seconds × retry attempts. Kept larger than
98+
# read-timeout so a single slow-but-eventually-successful SAP response
99+
# still lands inside the captured window rather than degrading the view.
100+
aggregate-timeout-seconds: ${CUSTOMER360_AGGREGATE_TIMEOUT_SECONDS:50}
92101

93102
# ----------------------------------------------------------------------------
94103
# Resilience4j: retry + circuit breaker for SAP upstream calls

0 commit comments

Comments
 (0)