Skip to content

Latest commit

 

History

History
307 lines (270 loc) · 34.7 KB

File metadata and controls

307 lines (270 loc) · 34.7 KB

Why this project exists

This repository is a reboot of a Flutter decompiler research effort. The core goal is simple:

  • take a real Flutter AOT binary
  • recover control flow and data flow
  • emit readable pseudo Dart, not just assembly

The project is focused on static analysis first. It is designed for reverse engineering research, security work, and interoperability study on binaries you are legally allowed to analyze.

What the research says

This repository keeps the research conclusions directly in this document. The important conclusions that drive this implementation are:

  • parsing Dart snapshots is hard and changes across versions
  • existing tooling already solves parts of parsing well
  • the novel part is the decompiler pipeline from machine code to readable pseudo Dart
  • adapter based parsing is the safest way to survive Dart and Flutter version churn
  • runtime and dynamic instrumentation are useful as optional fallback, not as the default path
  • strict quality gates are necessary to stop unreadable pseudocode from looking "done"

That is why this codebase separates snapshot extraction from decompilation logic. We can swap parsing adapters without rewriting the decompiler core.

High level architecture

The pipeline is:

  1. load input (apk or libapp.so) and locate snapshot blobs and instruction regions
  2. run an adapter to produce a normalized program model
  3. disassemble ARM64 instructions with Dart ABI aware annotations
  4. lift to low level IR and build CFG
  5. emit structured pseudo Dart with readability passes
  6. write reports and enforce quality gates

Current module layout:

  • crates/flutterdec-loader: APK and ELF loading, snapshot bundle extraction, and shared APK session caching
  • crates/flutterdec-adapter: adapter execution and model contract handling
  • crates/flutterdec-disasm-arm64: ARM64 disassembly and call or branch tagging
  • crates/flutterdec-ir: LLIR plus basic block and CFG construction
  • crates/flutterdec-decompiler: pseudo Dart emission and readability transforms
    • internal split:
      • top-level orchestration in src/lib.rs
      • CFG flow entry in src/control_flow.rs
      • instruction lifting in src/control_flow/expression_lift.rs
      • CFG edge logic in src/control_flow/graph.rs
      • block and branch emission in src/control_flow/emit.rs
      • readability pass pipeline entry in src/passes.rs
      • pass internals in src/passes/compaction.rs, src/passes/structural_helpers.rs, src/passes/naming.rs, and src/passes/expr_cleanup.rs
      • structural helper details in src/passes/structural_helpers/block_and_conditions.rs, src/passes/structural_helpers/guard_and_flow.rs, and src/passes/structural_helpers/naming_support.rs
      • helper-flow entry in src/helper_flow.rs
      • helper parsing in src/helper_flow/parse.rs
      • helper inlining and collapse in src/helper_flow/inlining.rs
      • helper summary and visit-limit logic in src/helper_flow/summary.rs
      • helper utility entry in src/helpers.rs
      • register parsing in src/helpers/registers.rs
      • expression simplification in src/helpers/expr.rs
      • instruction parsing in src/helpers/instruction_parse.rs
      • naming helpers in src/helpers/naming.rs
      • selector catalog in src/helpers/selector_table.rs
      • selector catalog categories in src/helpers/selector_table/categories.rs
      • selector candidate normalization in src/helpers/selector_table/candidates.rs
      • selector catalog matching in src/helpers/selector_table/matching.rs
      • call-intent entry in src/helpers/call_intent.rs
      • call-intent intent mapping in src/helpers/call_intent/intent.rs
      • call-intent library-context mapping in src/helpers/call_intent/library.rs
      • call-intent selector resolution in src/helpers/call_intent/selector_resolution.rs
      • call-intent text extraction helpers in src/helpers/call_intent/extract.rs
      • lift-state and branch-condition helpers in src/helpers/state_and_flow.rs
      • regression test entry in src/tests.rs
      • test groups in src/tests/shared.rs, src/tests/emit_and_helpers.rs, src/tests/cfg_and_stack.rs, src/tests/compaction_and_aliasing.rs, and src/tests/golden_and_parser.rs
      • emit/helper test details in src/tests/emit_and_helpers/helper_inlining.rs and src/tests/emit_and_helpers/readability_and_naming.rs
      • CFG/stack test details in src/tests/cfg_and_stack/call_and_loops.rs and src/tests/cfg_and_stack/omitted_path_and_stack.rs
      • compaction test details in src/tests/compaction_and_aliasing/control_flow_compaction.rs and src/tests/compaction_and_aliasing/alias_and_expr_cleanup.rs
  • crates/flutterdec-core: orchestration, artifact writing, and quality report logic
    • top-level entry in src/lib.rs
    • pipeline utilities in src/pipeline/helpers.rs
    • adapter-model loading in src/pipeline/model.rs
    • quality gate computation in src/pipeline/quality.rs
    • command-runner orchestration entry in src/pipeline/runners.rs
    • runner reporting helpers in src/pipeline/runners/reporting.rs
    • runner symbol and pool naming helpers in src/pipeline/runners/symbols.rs
    • runner-focused tests in src/pipeline/runners/tests.rs
    • symbol-map entry in src/pipeline/symbol_map.rs
    • symbol-map types in src/pipeline/symbol_map/types.rs
    • symbol-map run/load path in src/pipeline/symbol_map/run.rs
    • symbol-map ELF section and symbol helpers in src/pipeline/symbol_map/elf.rs
    • symbol-map call-scan and target-resolution helpers in src/pipeline/symbol_map/analysis.rs
    • symbol-map tests in src/pipeline/symbol_map/tests.rs
    • ELF fingerprint extraction in src/pipeline/engine_fingerprint.rs
  • crates/flutterdec-cli: user facing commands

Data contracts

The decompiler expects a normalized model from the adapter layer. That model includes:

  • functions and entry addresses
  • classes and library metadata when available
  • object pool entries
  • architecture and snapshot metadata

This keeps the rest of the system independent from any single parser implementation.

Output philosophy

The target output is pseudo Dart that helps humans understand behavior quickly. It is not intended to compile back into the original program.

Readability wins over low level fidelity when there is a tradeoff. For example:

  • preserve branch semantics but hide register noise when possible
  • normalize raw tokens into stable placeholders
  • simplify noisy arithmetic forms into cleaner constants and offsets when safe
  • inline helper fragments where practical and collapse remaining helper scaffolding
  • represent very complex unresolved paths as a single summary comment per function plus safe fallbacks
  • avoid synthetic "alternative path" branches that duplicate control flow noise
  • label indirect call targets with semantic placeholders instead of raw register names
  • render stack accesses as indexed slots instead of synthetic field names
  • alias key registers to semantic names (for example return address and frame pointer)
  • collapse empty if { } else { ... } forms into negated if blocks
  • hoist else bodies when the if branch terminates, to reduce nested indentation noise
  • collapse redundant guarded returns (if (cond) return x; return x;) into a single return x;
  • remove redundant repeated null-guard checks when the first guard already terminates and the checked variable was not reassigned
  • fold simple nested guard if blocks into combined conditions when the outer block contains only the inner guard
  • merge consecutive same-scope if (...) { continue; } guards into combined || guard conditions
  • rewrite adjacent if (x > K) return ...; if (x >= L) continue; pairs into explicit bounded continue ranges
  • rewrite multi-continue while (true) loops into explicit retry-flag loops, then collapse one-shot retry wrappers back to straight-line flow
  • collapse nested or trailing guard stacks that always return the same value (for example repeated return null guards before a final return null)
  • extract repeated (<value> - 1) expressions into a named alias (codePoint) when stable across the function
  • normalize negated comparison forms like !((a) != b) into direct equality checks
  • remove redundant condition wrapping parentheses in emitted if statements when the outer wrappers carry no meaning
  • surface unknowns explicitly instead of inventing fake certainty

Quality gates and metrics

The CLI writes quality.json and fails the run when strict thresholds are violated. The report tracks:

  • disassembly coverage ratio
  • unresolved control flow count
  • placeholder condition count
  • indirect call ratio
  • semantic direct-call rewrite count
  • semantic indirect-call rewrite count
  • dispatch-selector fallback count
  • target-va symbol rewrite count
  • report-level semantic intent counts (framework, stdlib, runtime, native, selector-tagged, constructor calls)
  • metadata coverage counters in report.json (pool_value_hints, pool_semantic_hints, pool_target_symbols)
  • selector fallback diagnostics in report.json (total, unique, top unresolved selector: names, and sample call lines)
  • call fallback diagnostics in report.json (dynamicCall, dispatch.invoke, dispatchTarget non-dispatch fallback calls, and generic indirectTargetN(...) fallback counts)
  • prioritization diagnostics in report.json (prioritization.enabled, selected_count, and per-function component score breakdown for selected capped functions)
  • readability regressions such as helper block leakage and raw token leakage
  • omitted path marker count for complex regions that are currently summarized
  • residual loop back-edge summary marker count for loops that are not yet structured

This makes progress measurable and keeps regressions visible in automation.

Current scope and limits

Current scope:

  • Android ARM64 static pipeline
  • adapter backed model ingestion
  • IR and pseudo Dart generation with iterative readability passes
  • readability passes now prune dead statements after terminal control flow and unwrap non-retry while (true) wrappers when the body already terminates
  • optional stripped vs unstripped ELF symbol mapping to recover readable direct-call targets
  • decompile can now ingest map-symbols target JSON directly to inject mapped call names into pseudocode
  • external symbol names are normalized (including C++ demangle and runtime/native prefixes) before pseudocode emission
  • pseudocode call sites now include semantic intent comments for recognized stdlib/runtime/native targets
  • when intent is deterministic, callsites are rewritten to semantic paths and keep traceability via was: <original_name>
  • deterministic selector evidence can also rewrite indirect callsites and records indirect via: <target_alias> in comments
  • when a call argument is exactly pool[<idx>] and a string hint exists, it is rendered as "value" /* pool[<idx>] */
  • non-exact pool expressions now keep structure and add inline pool mapping comments (for example pool[40 /* "_offsetInBytes" */])
  • selector coverage now includes more Flutter and Dart standard methods (for example Stream.listen, Future.catchError, SchedulerBinding.addPostFrameCallback, and ChangeNotifier listener APIs)
  • when selector evidence exists but no standard mapping matches, indirect callsites now use readable selector fallback forms: dispatch.<selector>(...) for general selectors and <Selector>.new(...) for constructor-like selectors (annotated with heuristic: constructor-like selector)
  • indirect target expressions are now scanned for selector hints too (not only call args), enabling more deterministic rewrites away from dynamicCall(...)
  • unresolved dispatchTarget calls now prefer semantic library invoke names when URI evidence exists (for example flutter.widgets.invoke(...) or spotube.models.connect.load.invoke(...)); otherwise they fall back to callable target form <resolvedTarget>(...) when target expressions are known, and only then to dispatch.invoke(...); unresolved generic aliases render as callable <target>(...), so raw dynamicCall(...) only remains for truly unknown target forms
  • unresolved generic direct call targets (sub_*/fn_0x*) can now also rewrite to semantic owner invoke paths when call arguments carry both a library URI marker and an owner-class marker (for example framework:flutter.widgets.RenderErrorBox.invoke from package:flutter/src/widgets/heroes.dart + RenderErrorBox.), still preserving was: <original> traceability
  • noncanonical indirect targets (for example xzr) now also prefer callable fallback form (xzr(...)) with traceability comments, further reducing raw dynamicCall(...) output noise
  • low-level dispatch slot expressions such as reg21.f0 are now surfaced through a readable alias (dispatchTargetFn) before unresolved callable callsites
  • selector extraction now ignores likely file/URI/path-like strings (*.dart, paths, URLs) to reduce false-positive standard-call labeling
  • declaration typing now uses deterministic context: semantic call ownership (flutter.*/dart.*/owner-qualified package paths), constructor semantics (*.new), and literal assignments can upgrade dynamic declarations into concrete types (for example flutter.widgets.State, dart.async.Future, dart.async.StreamIterator, String, bool)
  • declaration typing now also infers local return types from deterministic semantic call paths (for example String, bool, int, double, Type, dart.async.Future, dart.async.StreamSubscription) to reduce dynamic noise on non-constructor standard calls
  • declaration typing now also recognizes constructor-like fallback call paths with PascalCase roots (for example AndroidPermission.new(...)) so inferred local types stay concrete even when standard library ownership metadata is missing
  • declaration typing now also treats pool-mapped literal assignments ("value" /* pool[...] */) as concrete String locals instead of leaving them as dynamic
  • declaration typing now also infers bool from condition context (if (x), x && y, x == true) so argument/local declarations keep less dynamic noise in control-flow-heavy functions
  • repeated pool-mapped selector literals now hoist into local String aliases (for example poolStr42) so repeated callsites stay compact and readable
  • adapter object-pool metadata fields (decoded_kind, selector, target_va, owner_class, library_uri) are now consumed by decompile for deterministic owner-qualified selector rewrites
  • adapter model contract now accepts schema versions 2 and 3; v3 adds optional per-function name_kind and optional object-pool provenance fields (confidence, source) while preserving v2 compatibility defaults
  • adapter execution now supports backend selection (auto, internal, blutter) so deterministic parser backends can be introduced without changing decompiler core contracts
  • default adapter backend mode is auto: it attempts Blutter bridge parsing when configured (FLUTTERDEC_BLUTTER_CMD or FLUTTERDEC_BLUTTER_PY) and falls back to internal parsing for resilience
  • Blutter bridge parsing currently normalizes asm/*.dart and pp.txt output into ProgramModel (libraries, classes, functions, and best-effort object_pool target metadata), synthesizes deterministic EntryPointCandidate pool entries for main/runApp-like functions when present, and serializes blutter invocations with a cache lock to avoid concurrent runner races
  • owner-only metadata (selector + owner_class without library URI) can still rewrite indirect selector calls to deterministic owner-qualified call paths
  • if pool entries miss selector/owner/library metadata, core now backfills semantic hints from function ownership metadata keyed by target_va
  • when metadata includes target_va and that address resolves to a non-generic symbol, indirect calls can be rewritten to the resolved symbol path (with target_va traceability in comments)
  • model-backed canonical naming now deterministically tags Dart stdlib (dart:*), Flutter framework (package:flutter/*), and package-owned calls (package:*) when adapter metadata includes class/library ownership
  • pool target symbol synthesis now also emits deterministic package_<pkg>_<Owner>_<method> names for package:* library targets, improving generic direct-call replacement in app/dependency code paths
  • symbol merge precedence now upgrades heuristic canonical names (dart_*, flutter_*, package_*) to stronger external symbols when both map to the same VA, reducing synthetic call names when symbol maps/ELFs are provided
  • symbol merge now uses an explicit quality lattice (placeholder < heuristic < external < exact) and reports final name-quality mix plus merge replacement diagnostics under name_resolution in report.json
  • adapter schema reporting now includes function_name_kind_breakdown (exact, external, heuristic, placeholder, unknown, unspecified) so model naming confidence can be tracked across versions/backends
  • decompile reports now also include adapter_selection tracing (requested backend, resolved backend, adapter executable and manifest mapping, snapshot hash agreement) plus best-effort engine_fingerprint_context from nearby or APK-bundled libflutter.so
  • decompile report.json now includes a dedicated compatibility section with schema support status, manifest-entry presence, snapshot hash alignment, and warning diagnostics
  • flutterdec info now surfaces lightweight compatibility signals too (adapter_kind, manifest-entry presence, snapshot-hash match, warnings) so researchers can triage adapter health without full decompile
  • decompile/diff now support --require-snapshot-hash-match for strict adapter-vs-loader hash enforcement; diff_report.json now also reports per-side snapshot hash match booleans
  • CLI now includes flutterdec diff --old ... --new ... to compare two builds at the recovered-function descriptor level (added/removed/common counts plus top changed signatures), with the same scope/package filters used by decompile; diff output now also normalizes unstable file://.../.dart_tool/flutter_build/... URIs and reports package-level churn summaries (added_packages_top, removed_packages_top)
  • generic symbol detection now also covers common tool-generated placeholders (FUN_<hex>, nullsub_*, loc_*, off_*) so deterministic semantic/external names can replace them
  • decompiler target-va rewrite now shares the same generic placeholder guard (sub_*, fn_0x*, FUN_<hex>, nullsub_*, loc_*, off_*, fun_<hex>) so indirect calls do not regress into tool placeholder callnames
  • decompiler call intent now rewrites canonical package-machine symbols (package_<pkg>_<Owner>_<method>) into readable pkg.Owner.method(...) call paths and emits matching package:<...> semantic comments
  • package-machine intent parsing now preserves underscore-heavy owner/method splits (for example package_spotube_Foo_Bar_internal_init -> spotube.Foo_Bar.internal_init)
  • framework-machine intent parsing now preserves underscore-heavy class/method splits (for example flutter_widgets_Render_Flex_perform_layout -> flutter.widgets.Render_Flex.perform_layout)
  • Dart patch-library semantic naming now includes patch module stems when available (for example dart:core-patch/bool_patch.dart -> dart.core_patch.bool_patch.*) to reduce ambiguous dart.core_patch.* callsites
  • direct-call intent parsing now preserves full Dart library token paths and owner class segments from canonical names (for example dart_core_patch_bool_patch_fromEnvironment and dart_typed_data_TypedData_offsetInBytes) instead of collapsing to dart.core.*
  • selector coverage now includes additional standard families such as Navigator.pushNamed and List.removeAt, improving deterministic semantic rewrites on real samples
  • selector coverage also includes internal/std selector forms such as match_end_index -> dart.core.Match.end
  • constructor-like standard selectors are now recognized too (for example KeyedSubtree, StreamIterator, Float32x4List, Int64List) and rewritten to semantic .new paths
  • stack-pointer-derived base expressions now collapse into indexed stack slots (for example sp[-0x30]) instead of synthetic field forms
  • selector resolution now also handles dart:io and typed-data style selectors such as supportsAnsiEscapes, offsetInBytes, and nativeSetFloat32
  • selector mapping now also recognizes internal stdlib constructors such as _NativeSocket and _CompileTimeError
  • internal selector-only stdlib forms now include _current -> dart.core.Iterator.current and _equivalentYear -> dart.core.DateTime.equivalentYear
  • internal selector-only mappings now also include framework/runtime helpers such as _listEquals -> flutter.foundation.listEquals and _prependTypeArguments -> dart_vm.prependTypeArguments
  • internal selector-only stdlib constructor mappings now also include _StreamController -> dart.async.StreamController.new and _RawDatagramSocket -> dart.io.RawDatagramSocket.new
  • internal typed-data selector mappings now also include _nativeSetFloat32x4 -> dart.typed_data.ByteData.setFloat32x4, _UnmodifiableUint8ArrayView -> dart.typed_data._UnmodifiableUint8ArrayView.new, and _Int32ArrayView -> dart.typed_data._Int32ArrayView.new
  • selector coverage now also includes additional deterministic Flutter observer/scheduler/navigation APIs (for example didPushRouteInformation, handleCommitBackGesture, scheduleWarmUpFrame, restorablePushNamed) plus broader async/core/typed-data standard selectors (for example scheduleMicrotask, runtimeType, setInt64, getFloat64x2)
  • runtime helper selectors such as yieldStarIterable are now tagged and rewritten to readable runtime semantic paths
  • VM-internal selector constructors such as _Closure and _TypeParameter now rewrite to runtime semantic constructor paths
  • standalone stack-pointer offset arguments now normalize to slot notation (sp[-0x10]) instead of raw arithmetic ((sp - 0x10))
  • repeated read-only stack slots now hoist into named locals (for example stackSlotNeg0x10) to reduce repeated low-level stack syntax
  • noisy wrapped field-access chains are now simplified (((((obj.f7)).f23)).f7 -> obj.f7.f23.f7)
  • optional ELF engine fingerprinting to estimate build identity from build-id and marker strings
  • decompile now exposes engine-level analysis profiles (balanced and light) plus per-feature --with-*/--no-* toggles for canonical model symbols, pool hints, semantic reporting, and bootflow category seeding to trade throughput vs readability
  • decompile now defaults to app-focused function scoping (app-unknown) so reverse-engineering output prioritizes app/user-defined code; you can switch to --function-scope app or --function-scope all when needed
  • decompile now supports --target to isolate a single function by id:<N> or va:0x<ADDR> (plus shorthand 0x<ADDR> / <N>), so developer workflows can focus decompile/disassembly on one function at a time
  • target mode records deterministic selection diagnostics in report.json.target_selection (kind, value, matched_count, scope_overridden) and bypasses capped prioritization
  • decompile can optionally emit ghidra_apply_symbols.py (--emit-ghidra-script) to apply recovered names as labels/functions inside Ghidra analysis sessions
  • decompile can optionally emit ida_apply_symbols.py (--emit-ida-script) to apply recovered names and pool-load comments inside IDA sessions
  • emitted Ghidra scripts also include pool-load comments derived from pool[...] annotations and recovered string hints, improving patching context directly in Ghidra
  • core pipeline implementation now keeps script-generation helpers in runners_scripts.rs (instead of inlining everything in runners.rs), reducing core runner file size and isolating RE-tool emit logic
  • diff/descriptor aggregation logic now lives in runners_diff.rs so run_diff and related descriptor/package helpers are isolated from decompile pipeline internals
  • decompile now also supports package-level scoping via repeatable --app-package <name>, so researchers can isolate pseudocode to selected app Dart packages and exclude unknown/dependency/framework noise more aggressively
  • report output now includes detected app package frequency (function_scope.app_package_counts_top) to guide package scoping without guesswork
  • report output now includes function_scope.priority_package_hints, the effective package hints applied to capped prioritization
  • info output now surfaces top detected app packages too, so package scoping can be selected before a full decompile run
  • capped-function disassembly ordering now prioritizes likely high-value targets (entrypoint-like names, lifecycle/router selectors such as createState/build/onGenerateRoute, and deeplink/activity signals including object-pool target_va hints), applies app-package frequency boosts plus shallow entrypoint-callee frontier boosts, adds structural tie-break signals (function size and internal call out-degree), penalizes repeated non-generic names, downranks no isolate markers plus dart:isolate* library paths, and applies a first-pass owner/name diversity cap (with deferred backfill) so capped runs bias toward broader app-logic coverage instead of duplicated mapper/bootstrap/runtime helper families
  • capped selection now deterministically seeds one function per discovered bootflow category (main, runapp, deeplink, activity, bootstrap) before normal diversity fill, so low --max-functions runs preserve key entry/deeplink coverage
  • blutter adapter ingestion now synthesizes deterministic bootflow pool metadata from recovered function names (BootMainCandidate, BootRunAppCandidate, DeepLinkHandlerCandidate, ActivityHandlerCandidate, BootstrapInitCandidate) so main/runApp/deeplink/activity/init targets carry explicit target_va hints even when broader symbol data is sparse; activity and bootstrap candidates are now gated by owner/library context to reduce false positives from generic app methods
  • disassembly prioritization now dampens framework/stdlib bootflow boosts for deeplink/activity/bootstrap candidate kinds so app-owned handlers dominate capped reverse-engineering output
  • decompile reports include a bootflow_discovery section in report.json with categorized deterministic targets (main, runapp, deeplink, activity, bootstrap) and metadata (decoded_kind, selector, target_va, owner and library context); overlapping discoveries for the same category/target/selector are deduplicated
  • decompile now inspects AndroidManifest.xml directly from APK inputs and exposes android_manifest diagnostics in report.json (parse_mode, per-signal confidence, main_launcher, view_browsable, activity names, deeplink entries, parse errors, and synthetic manifest-hint counts); parsing is binary-AXML first with deterministic string-pool decoding and heuristic fallback, and manifest-derived candidate hints are injected into model metadata as Manifest*Candidate entries to reinforce deterministic entrypoint/deeplink/activity prioritization when adapter symbols are sparse
  • APK-oriented stages now share a loader-level ApkSession that opens the ZIP once per info or decompile run, indexes entry names, and caches entry bytes on demand; loader snapshot extraction, manifest inspection, APK startup scanning, and engine fingerprint lookup now reuse that session instead of reopening and rescanning the APK independently
  • info and decompile now also inspect APK classes*.dex entries for Android startup evidence and expose android_startup diagnostics (presence/confidence, scanned dex files, parse errors, Flutter embedding callsites, JNI/bootstrap stages, and recovered DartEntrypoint callsites when present); this is implemented in core as a report-focused APK bytecode pass and is controllable through the engine toggle apk_startup_analysis
  • APK startup evidence is now also translated into synthetic bootflow hints (Startup*Candidate) and merged back into the in-memory model with source = "apk_startup"; when a Dart target_va can be matched, those hints feed the existing prioritization path, and when it cannot, they still surface in report.json.bootflow_discovery with target_va = null so startup context is visible without pretending the mapping is solved
  • APK startup scanning now also performs a narrow same-method register trace for literal strings (const-string + move-object propagation) around DartEntrypoint.<init>, DartExecutor.executeDartEntrypoint, and FlutterJNI.runBundleAndSnapshotFromLibrary, so android_startup.dart_entrypoints can capture function_name, library_uri, and app_bundle_path when those values are statically visible in the APK bytecode
  • APK startup reporting now also derives android_startup.bootstrap_chain, a per-source-method ordered view of observed Android embedder startup stages (activity_on_create, delegate attach, engine ctor, loader init, JNI attach, Dart entrypoint execute) with app-vs-framework ownership, completeness, and missing-step diagnostics; it now also emits correlated paths when DEX bytecode shows app-defined method edges between startup entry methods and framework stage calls, and those paths carry manifest-aware anchor metadata (manifest_launcher_activity, manifest_deeplink_activity, manifest_application, manifest_activity, flutter_activity_subclass, or fallback heuristic/stage-terminal anchors) so the report shows when a startup path really ties back to a declared Android component
  • map-symbols can now register generated target summaries into a repo-local symbols/ cache (symbols/manifest.json, symbols/by-build-id/..., symbols/by-version/...), and APK decompile runs automatically ingest exact local cache matches by libflutter.so build id into report.json.engine_symbol_ingestion
  • when no explicit --app-package is provided, capped prioritization derives package hints from the parsed manifest package (for example oss.krtirtho.spotube -> spotube, org.localsend.localsend_app -> localsend_app + localsend) and boosts matching package:<name>/... functions so selected output stays focused on app-owned logic
  • when priority package hints exist, capped prioritization also applies a moderate penalty to non-preferred third-party package:<dep>/... functions (excluding package:app/...) to reduce dependency noise in top-N selection
  • prioritization report entries now include library_uri, so package ownership of selected functions is directly inspectable in report.json
  • prioritization reporting now also includes selected package-distribution aggregates (prioritization.selected_package_counts_top, selected_package_count_total, selected_unknown_library_count) so app-vs-dependency coverage can be evaluated without post-processing scripts
  • prioritization reporting now also includes selected scope mix and ratio (prioritization.selected_scope_mix, selected_app_like_ratio) to quickly assess how app-heavy capped selections are
  • prioritization reporting now also includes preferred-vs-other app package precision metrics (selected_preferred_app_count, selected_other_app_count, selected_preferred_app_ratio) based on effective preferred package hints
  • prioritization reporting now includes component-level aggregate totals (selected_component_totals_top) so heuristic dominance can be tuned directly from report output
  • startup-frontier scoring now adds explicit app/context bonuses and framework/stdlib penalties for startup-adjacent and bootstrap-like functions, so capped selections keep one useful bootstrap anchor without letting framework initialization dominate the top-N
  • prioritization reporting now includes selected bootflow coverage and hit summaries (selected_bootflow_coverage, selected_bootflow_hits_top) so capped output quality can be measured against discovered main/runApp/deeplink/activity/bootstrap targets
  • text rewrite and quality helper passes now avoid byte-index string slicing on UTF-8 content so non-ASCII pool strings do not panic decompile runs

Known limits:

  • no full Dart syntax reconstruction yet
  • some difficult control flow still remains as retry-flag loops instead of fully intent-aware Dart loop forms
  • very complex control-flow regions can be summarized as omitted-path comments
  • many symbols remain synthetic when metadata is obfuscated
  • direct source level naming is still heuristic

Language and maintainability choices

Rust is used for the core pipeline because it gives:

  • stronger guarantees around low level data handling
  • better long term maintainability for performance critical transforms
  • easier test isolation across modules

Python remains useful at the adapter boundary for faster version specific parser updates.

How to work on this repo

  • use nix develop for a reproducible toolchain
  • run cargo test before and after changes
  • use README.md for user-facing quick usage and command flow, and docs/development.md for contributor/development workflows
  • CI now runs formatting, clippy, full workspace tests, and a release CLI build on both Linux and Darwin runners for PRs and on main pushes (.github/workflows/ci.yml)
  • CI also lint-checks repository shell scripts via scripts/lint-shell.sh to keep automation scripts maintainable
  • CI validates Nix project configuration with nix flake check before Rust checks
  • tag pushes (v*) trigger cross-platform release artifact builds and GitHub release publishing in .github/workflows/release.yml
  • GitHub contribution hygiene is bootstrapped with issue templates, PR template, CODEOWNERS routing, and weekly Dependabot update PRs under .github/
  • local CI-parity validation is available via scripts/ci-check.sh (also exposed as nix run .#ci-check)
  • refresh decompiler golden snapshots with FLUTTERDEC_UPDATE_GOLDEN=1 cargo test -p flutterdec-decompiler golden_ when output changes intentionally
  • for end-to-end real binary regression checks, use scripts/real-golden.sh record|check for single profiles, or scripts/real-golden-matrix.sh check for multi-profile runs; those baselines now include report_metrics.json so startup, bootflow, entrypoint, and engine-symbol-ingestion deltas are diffed directly
  • keep profile configs in testdata/real-golden/profiles/*/profile.env
  • for naming improvements on direct call targets, use map-symbols on stripped/unstripped ELF pairs, then pass decompile --extra-symbol-map-targets /path/to/symbol_target_summary.json
  • decompile prefers external descriptive names over generic internal names (sub_*, fn_0x*) when addresses match
  • test against real Flutter binaries, not only synthetic fixtures
  • prioritize output readability improvements that are backed by concrete sample evidence

Near term roadmap

  • improve retry-loop structuring so remaining retry patterns become clearer intent-level flow
  • replace omitted-path comments with richer structured reconstructions
  • lift more Dart VM idioms into higher level expressions
  • improve naming and type inference from object pool and call patterns
  • expand validation corpus across more Flutter and Dart versions