WIP: feat: client reports by jpnurmi · Pull Request #1549 · getsentry/sentry-native

jpnurmi · 2026-03-03T16:26:39Z

Note

Depends on:

feat(transport): add HTTP retry with exponential backoff #1520

This PR implements client reports for tracking discarded events.

Add sentry_client_report module
Add send_client_reports option (enabled by default)
Instrument discard points: sample_rate, before_send, rate limiting, network errors, queue overflow
Piggyback client reports onto outgoing envelopes
Record network error discards per envelope item with correct data category, only when retry is not available

Close: #1216

#skip-changelog (wip)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The deferred startup retry scan (100ms delay) could pick up files written by the current session. Filter by startup_time so only previous-session files are processed. Also ensure the cache directory exists when cache_keep is enabled, since sentry__process_old_runs only creates it conditionally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Monotonic time is process-relative and doesn't work across restarts. Retry envelope timestamps need to persist across sessions, so use time() (seconds since epoch) for file timestamps, startup_time, and backoff comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename SENTRY_RETRY_BACKOFF_BASE_MS to SENTRY_RETRY_BACKOFF_BASE_S and sentry__retry_backoff_ms to sentry__retry_backoff, since file timestamps are now in seconds. The bgworker delay sites multiply by 1000 to convert to the milliseconds it expects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move startup_time initialization into sentry__retry_new and remove the unnecessary sentry__retry_set_startup_time indirection. Tests now use write_retry_file with timestamps well in the past to match production behavior where retry files are from previous sessions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When files exist but aren't eligible yet (backoff not elapsed), foreach was returning 0 causing the retry polling task to stop. Return total valid retry files found instead of just the eligible count so the caller keeps rescheduling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Make handle_result return bool (true = file rescheduled for retry, false = file consumed) and use it in foreach to decrement the total count. This avoids one extra no-op poll cycle after the last retry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…Y_THROTTLE Replace SENTRY_RETRY_BACKOFF_BASE_S and SENTRY_RETRY_STARTUP_DELAY_MS with ms-based constants so the transport uses them directly without leaking unit conversion details. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Give the retry module a bgworker ref and send callback so it owns all scheduling. Transport just calls _start and _enqueue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry__retry_new only returns NULL on failure, not based on options. sentry__retry_start and _enqueue require non-NULL retry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Deduplicate prepare/send/free sequence shared by retry_send_cb and http_send_task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Already covered by retry_throttle and retry_result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ashpad Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pass startup_time directly to _foreach as a `before` filter instead of a bool. Clear it after the first run so subsequent polls use backoff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Deduplicate filename construction across write_envelope, handle_result, and tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When the transport supports retry and http_retries > 0, sentry__process_old_runs now skips caching .envelope files from old runs. The retry system handles persistence, so duplicating into cache/ is unnecessary. Also simplifies sentry__retry_handle_result: only cache on max retries exhausted, not on successful send. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move the retry-aware check before cache_dir creation so we avoid mkdir when the retry system handles persistence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The retry callback now receives a sentry_envelope_t and returns a status code. The retry system handles deserialization and file lifecycle internally, keeping path concerns out of the transport. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add test case for successful send at max retry count with cache_keep enabled, confirming envelopes are cached regardless of send outcome. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lopes The startup poll used `ts >= startup_time` to skip envelopes written after startup. With second-precision timestamps, this also skipped cross-session envelopes written in the same second as a fast restart. Reset `startup_time` in `sentry__retry_enqueue` so the startup poll falls through to the backoff path for same-session envelopes. The bgworker processes the send task (immediate) before the startup poll (delayed), so by the time the poll fires, `startup_time` is already 0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Submit a one-shot retry send task before bgworker shutdown to ensure pre-existing retry files are sent even if the startup poll hasn't fired yet. The flush checks startup_time on the worker thread to avoid re-sending files already handled by enqueue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace `time(NULL)` (1-second granularity) with `sentry__usec_time() / 1000` (millisecond granularity) to avoid timestamp collisions that caused flaky `>=` vs `>` comparison behavior in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Make sentry__retry_flush block until the flush task completes by adding a bgworker_flush call, and subtract the elapsed time from the shutdown timeout. This ensures retries are actually sent before the worker stops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Change sentry__retry_write_envelope to return bool indicating success - Return false for nil event IDs (session envelopes) and write failures - sentry__retry_enqueue now returns early if write fails, preserving startup_time for session envelopes so retry_flush_task can flush them

Add null check after cloning cache_path to prevent dereferencing null later in sentry__retry_send when iterating directory or joining paths.

When system clock moves backward (now < ts), the condition now >= ts was false, causing the backoff check to be skipped entirely. This made items immediately eligible for retry regardless of their count. Now checks if now < ts (clock skew) OR if backoff hasn't elapsed.

Crashpad's kRetryAttempts=5 with `upload_attempts > kRetryAttempts` (checked before post-increment) allows upload_attempts 0-5, giving 6 retries with backoffs: 15m, 30m, 1h, 2h, 4h, 8h. sentry-native's `count + 1 < SENTRY_RETRY_ATTEMPTS` with the old value of 5 only allowed counts 0-3 to be re-enqueued, so the max backoff reached was 4h. Bumping to 6 gives the same 6 retries and the full backoff sequence up to 8h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename sentry__retry_flush to sentry__retry_shutdown and remove the bgworker_flush call so bgworker_shutdown gets the full timeout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Check the sentry__path_rename return value and log a warning on failure instead of silently ignoring it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move eligible++ after all item fields are populated so a NULL path from allocation failure does not leave a half-initialized item in the array. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace mutable startup_time + sealed with a state enum so that the startup flag is managed via atomic operations instead of clearing a uint64_t that may tear on 32-bit systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

If sentry__retry_new() fails, retry_func was still set on the transport, causing can_retry to return true and envelopes to be dropped instead of cached. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Envelopes without an event_id (e.g. sessions) were silently dropped by sentry__retry_write_envelope. This was a workaround (cd57ff4) for the old retry_write_envelope approach that regenerated a random UUID on each attempt, orphaning files on disk. The rewrite to rename-based counter bumps in handle_result (0f37177) made this safe: the UUID is parsed from the filename and preserved across renames, so a random UUID assigned at initial write stays stable through all retry cycles. Generate a random UUIDv4 for nil event_id envelopes instead of skipping them. Extract unreachable_dsn to module level. Add UUID consistency assertions to existing retry tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clear the scheduled flag before scanning so that a concurrent retry_enqueue always sees 0 and successfully arms a new poll via CAS. Previously, the flag was cleared after the scan returned, creating a window where enqueue could see 1, skip scheduling, and then the poll would clear the flag — stranding the newly enqueued item. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…n_write_cache Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…_result Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The retry_count >= 0 branch passed the full source filename to %.36s, which would grab the timestamp prefix instead of the UUID for retry- format filenames. Extract the cache name (last 45 chars) before either branch so both use the correct UUID. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use a mutex (sealed_lock) to serialize the SEALED check in retry_enqueue with the SEALED set in retry_dump_queue. Store the envelope address as uintptr_t so retry_dump_cb can skip envelopes already written by retry_enqueue without risking accidental dereferencing. The address is safe to compare because the task holds a ref that keeps the envelope alive during foreach_matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…move_cache Move sentry__retry_parse_filename and sentry__retry_make_path into sentry_database as sentry__parse_cache_filename and sentry__run_make_cache_path. This consolidates cache filename format knowledge in one module and replaces the fragile `src_len > 45` heuristic in sentry__run_move_cache with proper parsing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The SENTRY_RETRY_ATTEMPTS constant was bumped from 5 to 6 in 81d0f68 but the public API documentation was not updated to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use a SENTRY_POLL_SHUTDOWN sentinel so that a concurrent retry_poll_task cannot resubmit the delayed poll that shutdown just dropped. The CAS(SCHEDULED→IDLE) in retry_poll_task is a no-op when scheduled is SHUTDOWN, and the subsequent CAS(IDLE→SCHEDULED) also fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On Windows, WinHTTP TCP connect to an unreachable host takes ~2s, which can exceed the shutdown timeout. Add a cancel_client callback that closes just the WinHTTP request handle, unblocking the worker thread so it can process the failure and shut down cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… and worker Use InterlockedExchangePointer to atomically swap client->request to NULL in cancel, shutdown, and worker exit cleanup. Whichever thread wins the swap closes the handle; the loser gets NULL and skips. The worker also snapshots client->request into a local variable right after WinHttpOpenRequest and uses the local for all subsequent API calls, so it never reads NULL from the struct if cancel fires mid-function. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ncel Replace the unconditional cancel_client call before bgworker_shutdown with an on_timeout callback that fires only when the shutdown timeout expires. This avoids aborting in-flight requests that would have completed within the timeout, while still unblocking the worker when it's stuck (e.g. WinHTTP connect to unreachable host). The callback closes session/connect handles to cancel pending WinHTTP operations, then the shutdown loop falls through to the !running check which joins the worker thread. This ensures handle_result runs and the retry counter is properly bumped on disk. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nd_task The local `HINTERNET request = client->request` snapshot was only needed for the cancel_client approach. Since shutdown_client only fires at the timeout point, mid-function reads of client->request are safe. Keep only the InterlockedExchangePointer in the exit block to prevent double-close. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move sentry__atomic_store(&bgw->running, 0) from before the on_timeout callback to the else branch (detach path). This lets the worker's shutdown_task set running=0 naturally after finishing in-flight work, making the dump_queue safety-net reachable if the callback fails to unblock the worker within another 250ms cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…iness Tests that directly call sentry__retry_send were racing with the transport's background retry worker polling the same cache directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e data loss After retry_enqueue writes an envelope and stores its address in sealed_envelope, the envelope is freed when the bgworker task completes. If a subsequent envelope is allocated at the same address and is still pending during shutdown, retry_dump_cb would incorrectly skip it, losing the envelope. Clear sealed_envelope after the first match so later iterations cannot false-match a reused address. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-retry

Implement SDK telemetry for tracking discarded events per https://develop.sentry.dev/sdk/telemetry/client-reports/ - Add sentry_client_report module - Add send_client_reports option (enabled by default) - Instrument discard points: sample_rate, before_send, rate limiting, network errors, queue overflow - Piggyback client reports onto outgoing envelopes - Record network error discards per envelope item with correct data category, only when retry is not available - Add unit and integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpnurmi and others added 30 commits February 23, 2026 14:06

feat(transport): add HTTP retry logic with exponential backoff

8d0033b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(retry): replace scan+free_paths with foreach callback API

9fb83b8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(transport): set 15s request timeout for curl and winhttp

315b79e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): avoid duplicate delayed retry tasks on startup

7c7d06c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(retry): take options in sentry__retry_new, own path construction

22c26da

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(retry): encapsulate retry scheduling into the retry module

a411b29

Give the retry module a bgworker ref and send callback so it owns all scheduling. Transport just calls _start and _enqueue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(transport): remove unnecessary includes, restore blank line

38af347

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(retry): move precondition checks to callers

58e6f89

sentry__retry_new only returns NULL on failure, not based on options. sentry__retry_start and _enqueue require non-NULL retry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(transport): extract http_send_envelope helper

7dc961e

Deduplicate prepare/send/free sequence shared by retry_send_cb and http_send_task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test(retry): remove redundant retry_no_duplicate_rescan test

633a347

Already covered by retry_throttle and retry_result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(curl): use CURLOPT_TIMEOUT_MS for consistency with winhttp and cr…

32dfe4c

…ashpad Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(retry): unify startup and poll into a single task

dc7c627

Pass startup_time directly to _foreach as a `before` filter instead of a bool. Clear it after the first run so subsequent polls use backoff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(retry): extract sentry__retry_make_path helper

d94191a

Deduplicate filename construction across write_envelope, handle_result, and tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ref(database): derive can_cache flag to skip cache dir creation early

7ad242d

Move the retry-aware check before cache_dir creation so we avoid mkdir when the retry system handles persistence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test(retry): verify cache_keep preserves envelopes on successful send

3b7af08

Add test case for successful send at max retry count with cache_keep enabled, confirming envelopes are cached regardless of send outcome. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): use PRIu64 format specifier for uint64_t

cc729dc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): guard against unsigned underflow in backoff check

91aa134

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jpnurmi and others added 29 commits February 24, 2026 15:23

fix(retry): check for NULL from sentry__path_clone

b6260c0

Add null check after cloning cache_path to prevent dereferencing null later in sentry__retry_send when iterating directory or joining paths.

fix(retry): avoid retry flush consuming entire shutdown timeout

e28b88e

Rename sentry__retry_flush to sentry__retry_shutdown and remove the bgworker_flush call so bgworker_shutdown gets the full timeout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): warn on failed retry envelope rename

0b78591

Check the sentry__path_rename return value and log a warning on failure instead of silently ignoring it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): check for NULL from sentry__path_clone in retry send

4fa8410

Move eligible++ after all item fields are populated so a NULL path from allocation failure does not leave a half-initialized item in the array. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(retry): clear retry_func when retry fails to initialize

889ba35

If sentry__retry_new() fails, retry_func was still set on the transport, causing can_retry to return true and envelopes to be dropped instead of cached. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'master' into jpnurmi/feat/http-retry

ca270ea

refactor(database): add retry_count to write_envelope, add sentry__ru…

b35273b

…n_write_cache Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(database): make sentry_run_t refcounted

77cc3dd

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor(cache): strip retry prefix in move_cache and simplify handle…

4c108f4

…_result Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: fix retry count 5 → 6 in sentry_transport_retry docs

30dbb71

The SENTRY_RETRY_ATTEMPTS constant was bumped from 5 to 6 in 81d0f68 but the public API documentation was not updated to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(test): disable transport retry in unit tests to fix valgrind flak…

b373e70

…iness Tests that directly call sentry__retry_send were racing with the transport's background retry worker polling the same cache directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/master' into jpnurmi/feat/http…

696127b

…-retry

jpnurmi force-pushed the jpnurmi/feat/http-retry branch from b133fd6 to 93db900 Compare March 17, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: feat: client reports#1549

WIP: feat: client reports#1549
jpnurmi wants to merge 95 commits intojpnurmi/feat/http-retryfrom
jpnurmi/feat/client-reports

jpnurmi commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jpnurmi commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant