poc(native): Add Android support for out-of-process crash handling#1725
poc(native): Add Android support for out-of-process crash handling#1725jpnurmi wants to merge 23 commits into
Conversation
Support the native crash daemon on Android by using a file-backed shared mapping and passing its fd through daemon startup. Avoid Android-only link issues and include libunwindstack headers for the daemon target. Add Android integration coverage for the native backend that verifies minidump creation on an emulator. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Assign inherited notification fds only after daemon shared memory mapping and validation succeed so failed Android daemon initialization does not retain them. Co-Authored-By: OpenAI Codex <noreply@openai.com>
app_pid and app_tid are used to reconstruct the Android crash IPC shared memory path, so do not mark them as unused. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Use the configured database path for Android crash IPC shared memory files instead of falling back to shell temp locations. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Treat an empty ANDROID_API value the same as an unset one so importing test conditions does not fail before collection. Co-Authored-By: OpenAI Codex <noreply@openai.com>
| project.fileTree("build/intermediates/") { | ||
| include("**/abi.json") | ||
| include("**/prefab_publication.json/debug") | ||
| include("**/prefab_publication.json/release") |
There was a problem hiding this comment.
Malformed glob patterns will never match files
Medium Severity
The include patterns **/prefab_publication.json/debug and **/prefab_publication.json/release treat prefab_publication.json as a directory containing files named debug/release. Since prefab_publication.json is a regular file (not a directory), these Ant-style glob patterns will never match anything. The c++_static → none replacement intended for those files silently won't happen, which can cause STL-related linking issues for AAR consumers. The patterns likely need to be **/prefab_publication/debug/** and **/prefab_publication/release/**, or simply **/prefab_publication.json.
Reviewed by Cursor Bugbot for commit 03a811f. Configure here.
7ea304b to
11f94c3
Compare
| android.suppressUnsupportedCompileSdk=34 | ||
|
|
||
| # TODO: clean up | ||
| sentryBackend=native |
There was a problem hiding this comment.
POC debug configuration committed in gradle properties
Medium Severity
The sentryBackend=native property in gradle.properties with a # TODO: clean up comment overrides the default "inproc" backend for all NDK library builds. This POC-only configuration changes the default behavior for every consumer of the NDK module. Similarly, the unconditional implementation("io.github.vvb2060.ndk:curl:8.18.0") dependency (with its own # TODO) bloats the AAR for non-native-backend builds. Both TODO comments indicate these are temporary and not meant to be permanent.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 3e9a856. Configure here.
Use libunwindstack for Android native-backend crash events so daemon-captured crashes get the same stack frames as in-process events. Filter /proc maps to real ELF images and keep separate mappings by load base to avoid bogus debug images from named non-ELF mappings. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Forward handled crash signals to the previously installed signal handler after the native daemon has captured the crash. This lets Android record the process exit as a native app crash and produce a tombstone. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Enable the native daemon to write crash envelopes into the cache directory so the Java SDK can merge tombstones before sending. This keeps native backend crashes on the Android tombstone path instead of sending immediately from the daemon. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Keep the Android native backend enabled while disabling minidump capture for the sample app. This exercises tombstone merging for native backend crashes without sending Breakpad minidumps. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Install native signal handlers during Android preload without activating crash IPC until sentry_init. This keeps CoreCLR ahead in the signal chain for managed exceptions while native crashes still chain back to sentry-native. Resolve the Android crash daemon path in the native backend so direct C API clients do not need NDK wrapper-specific handler-path setup. Co-Authored-By: OpenAI Codex <noreply@openai.com>
| if (sentry__atomic_fetch(&g_preloaded)) { | ||
| g_crash_ipc = NULL; | ||
| SENTRY_DEBUG("crash handler deactivated, keeping preloaded handlers"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Bug: The signal stack (g_signal_stack.ss_sp) allocated during preload is never freed on shutdown, causing a memory leak.
Severity: MEDIUM
Suggested Fix
The shutdown logic for the preloaded case should be updated to free the resources it allocated. Specifically, the memory for g_signal_stack.ss_sp should be freed and the signal stack should be disabled before the function returns, even when g_preloaded is true.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/backends/native/sentry_crash_handler.c#L920-L924
Potential issue: When the crash handler is preloaded by calling
`sentry__crash_handler_preload()`, memory is allocated for a signal stack and assigned
to `g_signal_stack.ss_sp`. However, during shutdown
(`sentry__crash_handler_shutdown()`), an early return is taken if the `g_preloaded` flag
is set. This bypasses the cleanup logic that frees the allocated signal stack. As a
result, the memory for `g_signal_stack.ss_sp` is never deallocated, leading to a memory
leak in applications that use the preload feature and then shut down the handler.
Pass the Android tombstone setting through NdkOptions to choose the native crash reporting mode. Use native-only mode when tombstone merging is enabled so the native envelope can be merged, and include a minidump when tombstone merging is disabled. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Remove the native test database after the Android minidump assertion so later tests do not observe stale crash state on the device. Co-Authored-By: OpenAI Codex <noreply@openai.com>
Add a built-in stdout transport so integration tests can exercise the default transport path while still writing envelopes to stdout. The native crash daemon resolves the default transport from the SDK build; it cannot depend on the example installing a process-local transport callback. Using SENTRY_TRANSPORT=custom for this test would also consume the downstream extension point, where embedding SDKs must keep providing their own sentry__transport_new_default implementation. Co-Authored-By: OpenAI Codex <noreply@openai.com>
| char *serialized = sentry_envelope_serialize(envelope, &size_out); | ||
| printf("%s", serialized); |
There was a problem hiding this comment.
Bug: The return value of sentry_envelope_serialize is not checked for NULL before being passed to printf, which can cause a crash in out-of-memory situations.
Severity: LOW
Suggested Fix
Add a null-check for the serialized variable after calling sentry_envelope_serialize. If the variable is NULL, handle the error gracefully, for example by skipping the printf call, to prevent a crash.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/transports/sentry_transport_stdout.c#L11-L12
Potential issue: Under extreme out-of-memory conditions, the function
`sentry_envelope_serialize` can return a `NULL` pointer. This occurs if an internal
1-byte memory allocation for an empty string fails. The code passes this `NULL` return
value directly to `printf` without a check. This results in undefined behavior and is
known to cause a segmentation fault and crash on platforms like Android using the Bionic
libc.
| char *serialized = sentry_envelope_serialize(envelope, &size_out); | ||
| printf("%s", serialized); | ||
| fflush(stdout); | ||
| sentry_free(serialized); |
There was a problem hiding this comment.
Stdout transport null dereference
Medium Severity
When sentry_envelope_serialize fails and returns NULL, print_envelope still passes the result to printf and sentry_free, which is undefined behavior and can crash the process.
Reviewed by Cursor Bugbot for commit 57004b7. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 5 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6460208. Configure here.
| } else { | ||
| sentry_envelope_free(envelope); | ||
| envelope = NULL; | ||
| } |
There was a problem hiding this comment.
Android drops unsent crash envelopes
High Severity
On Android, after writing the native crash envelope to the run folder, the non-deferred path only uploads if a .merged tombstone file appears; on timeout it frees the envelope without calling sentry__capture_envelope. With default tombstoneEnabled=false (native+minidump), no merge file is expected, so crashes are often not sent over the transport.
Reviewed by Cursor Bugbot for commit 6460208. Configure here.
| } | ||
| #elif defined(SENTRY_PLATFORM_LINUX) || defined(SENTRY_PLATFORM_ANDROID) | ||
| # if defined(SENTRY_PLATFORM_ANDROID) \ | ||
| && defined(SENTRY_WITH_UNWINDER_LIBUNWINDSTACK) |
There was a problem hiding this comment.
Libunwindstack macro name mismatch
Medium Severity
Android libunwindstack unwinding is gated on SENTRY_WITH_UNWINDER_LIBUNWINDSTACK, but CMake only enables the sources and library via SENTRY_WITH_LIBUNWINDSTACK and never defines the unwinder macro on the target. The new remote unwind path in sentry__process_crash is therefore compiled out even when libunwindstack is linked.
Reviewed by Cursor Bugbot for commit 6460208. Configure here.
| fail: | ||
| if (ipc->init_mutex) { | ||
| sentry__mutex_unlock(ipc->init_mutex); | ||
| } | ||
| if (ipc->shmem) { | ||
| munmap(ipc->shmem, SENTRY_CRASH_SHM_SIZE); | ||
| } | ||
| if (ipc->notify_fd >= 0) { |
There was a problem hiding this comment.
Bug: The native initialization function closes caller-provided file descriptors on failure without notifying the caller, leading to stale file descriptors on the Java side.
Severity: HIGH
Suggested Fix
The callee (sentry__crash_ipc_init_app_with_fds) should not close file descriptors that it does not own. The function should instead return an error code to the caller, sentry_android_crash_daemon_init, which can then propagate the failure status back to the Java layer. The caller, which owns the file descriptors, should be responsible for closing them upon learning of the initialization failure.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/backends/native/sentry_crash_ipc.c#L117-L124
Potential issue: During native crash daemon initialization via `nativeInitCrashDaemon`,
file descriptors (`notify_fd`, `ready_fd`) are passed from Java to the native layer. If
an error occurs during the subsequent initialization in
`sentry__crash_ipc_init_app_with_fds` (e.g., an `mmap()` failure), the error handling
logic closes these file descriptors. However, the original caller on the Java side is
not notified of this failure or that the descriptors have been closed. This leaves the
caller with stale file descriptors, which can lead to data corruption or crashes if they
are used later, as they might have been recycled by the OS for other resources.


Warning
WIP 🚧🔨⏳⛔
This enables the out-of-process native crash daemon on Android, including file-backed IPC, daemon packaging as
libsentry-crash.so, backend-owned daemon path resolution, and preload support for managed runtimes.The NDK integration now selects native crash reporting mode from the Android SDK tombstone merging setting: tombstone-enabled builds avoid minidumps because minidumps take precedence over native envelopes, while tombstone-disabled builds keep native-with-minidump for richer crash data.
This also adds Android daemon support for signal-handler chaining, crash envelope queueing,
libunwindstackunwinding, and focused Android integration coverage.A hidden
SENTRY_TRANSPORT=stdouttransport is added for tests so they can exercise the SDK default transport path without usingSENTRY_TRANSPORT=custom, which remains available for downstream SDKs.