Skip to content

HashMap-based domain lookup in TdsClient#8496

Open
anikiki wants to merge 9 commits into
developfrom
feature/ana/tds-tracker-hashmap-lookup
Open

HashMap-based domain lookup in TdsClient#8496
anikiki wants to merge 9 commits into
developfrom
feature/ana/tds-tracker-hashmap-lookup

Conversation

@anikiki
Copy link
Copy Markdown
Contributor

@anikiki anikiki commented May 8, 2026

Task/Issue URL: https://app.asana.com/1/137249556945/project/72649045549333/task/1213796972971719?focus=true

Description

Replaces the per-request linear scan over the full TDS tracker list in TdsClient.matches() with a HashMap-based label-walk lookup, gated by a new optimizeTrackerEvaluationV3 remote feature flag.

For non-tracker URLs (the majority of subresource requests), the linear scan visits every tracker entry before returning false. The new path walks the URL host's labels upward (api.sub.tracker.comsub.tracker.comtracker.com) doing a hash lookup at each step, bounded by host label count (typically 2–4 lookups). A local microbenchmark over the real tds.json shows ~90x faster on a mixed URL set; production validation will come from the page-load wide event once V3 is rolled out.

Changes:

  • New optimizeTrackerEvaluationV3 toggle on AndroidBrowserConfigFeature (default OFF, internal-always-enabled). The old optimizeTrackerEvaluationV2 toggle is removed.
  • V2 (parse host once, linear scan with cached sameOrSubdomain) is now the unconditional fallback when V3 is OFF. The legacy V1 path (re-parsing host per iteration) is removed entirely.
  • OptimizeTrackerEvaluationRCWrapper repurposed to source from the V3 toggle; consumer call sites (PageLoadWideEvent, PageLoadedHandler, TrackerDataLoader) are unchanged.
  • Telemetry key value renamed tracker_optimization_enabled_v2tracker_optimization_enabled_v3 in both PageLoadWideEvent and PageLoadedOfflinePixelSender so the wide-event and offline-pixel pipelines stay in sync. Deliberate metric discontinuity — downstream dashboards keyed on the old name will need updating.
  • TdsClientTest re-parametrised across V2/V3 with new V3-specific tests for exact-host match, subdomain match, longest-suffix-wins, no match, null/empty host, single-label host, and non-label-aligned suffix.

Rollback: disable the V3 toggle remotely → users fall back to V2, the path the majority of the user base has been on. The deprecated optimizeTrackerEvaluationV2 remote-config entry is now unreferenced by client code.

Steps to test this PR

V3 enabled (default for internal builds)

  • Install the internal build; browse normally — pages load and trackers are blocked.
  • Visit a tracker-heavy site (any major news site); confirm the tracker shield count is non-zero and the listed entities match what you'd see on develop.
  • Confirm telemetry: the tracker_optimization_enabled_v3 field reads true in the page-load wide event / offline pixel.

V3 disabled (V2 fallback)

  • Disable optimizeTrackerEvaluationV3 via remote config; confirm browsing still works and the same tracker entities are blocked on a tracker-heavy site.
  • Confirm tracker_optimization_enabled_v3 reads false in telemetry.

Reference test suite

  • CI runs RequestBlocklistReferenceTest, SurrogatesReferenceTest, and DomainsReferenceTest — these are the load-bearing behavioural validation against the upstream TDS reference set. They must pass.

NO UI changes


Note

Medium Risk
Changes core tracker-detection matching behavior and lookup strategy, which could impact blocking decisions or performance if edge cases regress. Also renames telemetry keys, causing metric discontinuity and requiring downstream dashboard updates.

Overview
Adds a new remote-config flag optimizeTrackerEvaluationV3 (internal-by-default) and routes OptimizeTrackerEvaluationRCWrapper to it.

Updates TdsClient to use a HashMap-backed, label-walk domain lookup when V3 is enabled (falling back to the existing subdomain scan when disabled), and wires the flag through TrackerDataLoader.

Renames page-load telemetry from tracker_optimization_enabled_v2 to tracker_optimization_enabled_v3 across wide-event definitions, PageLoadWideEvent, and offline pixel sending, and expands tests (plus an ignored microbenchmark) to validate V3 lookup semantics.

Reviewed by Cursor Bugbot for commit 3155561. Bugbot is set up for automated code reviews on this repo. Configure here.

anikiki and others added 6 commits May 7, 2026 17:13
Removes optimizeTrackerEvaluationV2() and adds optimizeTrackerEvaluationV3()
to AndroidBrowserConfigFeature. The shared OptimizeTrackerEvaluationRCWrapper
now sources from the V3 toggle; consumer call sites are unchanged.
The OptimizeTrackerEvaluationRCWrapper now reflects V3 enablement, so
the wide-event field that carries its value should match. Constant name
unchanged; only the emitted string value moves from
tracker_optimization_enabled_v2 to tracker_optimization_enabled_v3.

Update test assertions that hardcoded the old literal string.
PageLoadedOfflinePixelSender ships the same trackerOptimizationEnabled
boolean as PageLoadWideEvent, just through the offline pixel pipeline.
The wide-event side was already moved to _v3 in the previous commit;
this aligns the offline pipeline so both emit under the same key.
Builds a Map<String, TdsTracker> keyed by tracker domain and walks the
URL host's labels upward, doing a hash lookup at each step. Bounded by
host label count (typically 2-4 lookups) instead of scanning all
~3000 trackers. Gated by the new optimizeTrackerEvaluationV3 flag.

V2 (parse host once + cached sameOrSubdomain linear scan) is now the
unconditional fallback. The legacy V1 path (parse host per iteration)
is removed.

Constructor param renamed optimizeTrackerEvaluation -> optimizeTrackerEvaluationV3
for self-documenting clarity. TrackerDataLoader call site uses named
arguments. TdsClientTest re-parametrised across V2/V3 with new V3-specific
cases for label-walk semantics including longest-suffix-wins.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread app/src/main/java/com/duckduckgo/app/trackerdetection/TdsClient.kt
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5f3409b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants