fix(telemetry): hash file paths in payloads by typpo · Pull Request #717 · promptfoo/modelaudit

typpo · 2026-03-16T21:21:32Z

Use hashed identifiers for telemetry file and URL fields instead of raw values.
Update the telemetry regression tests and changelog wording.

Summary by CodeRabbit

Bug Fixes
- Telemetry payloads no longer include raw file paths, URLs, or other direct location identifiers to better protect sensitive file and location information.
- Anonymous metadata (e.g., file/type summaries, counts) is preserved without leaking raw identifiers.
Tests
- Test suite updated to assert absence of path/URL identifiers and raw location values.
Documentation
- Changelog updated to note the privacy-preserving telemetry change.

coderabbitai · 2026-03-16T21:21:54Z

Walkthrough

Telemetry payloads were simplified to omit raw file paths and URL identifiers. Two helper builders, _build_path_properties and _build_url_properties, were added and multiple telemetry recorders were refactored to use them. Tests and the changelog were updated to reflect the privacy-preserving payload shape.

Changes

Cohort / File(s)	Summary
Telemetry core `modelaudit/telemetry.py`	Added `_build_path_properties(path)` and `_build_url_properties(url)`; refactored recorders (`record_scan_started`, `record_scan_completed`, `record_file_type_detected`, `record_issue_found`, `record_download_started`, `record_download_completed`) to use these builders and removed raw `file_path`, `path_identifiers`, and URL identifier fields from payloads.
Tests `tests/test_telemetry.py`	Updated tests to assert privacy-preserving payloads: removed expectations for raw paths/URLs and identifiers, verified sensitive values are absent in serialized telemetry, and adjusted canonical issue references.
Changelog `CHANGELOG.md`	Added a Fixed entry noting telemetry payload simplification to omit file path identifiers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hop through code with nimble feet,
Paths hidden where the quiet hashes meet,
Raw URLs and files I softly hide,
Telemetry whispers, tidy and wide,
A rabbit dances — privacy's delight.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(telemetry): hash file paths in payloads' accurately describes the main objective of the PR: replacing raw file paths and URLs with hashed identifiers in telemetry payloads.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/telemetry-file-hashes

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/telemetry.py`:
- Around line 544-551: The code coerces issue.get("location", "") to str,
turning None into the string "None" and causing _hash_identifier to produce
synthetic IDs; change issue_location to keep the original value (e.g.,
issue_location = issue.get("location")) and only convert to string or call
_hash_identifier / _extract_model_name / _classify_path when the value is
non-null/non-empty (e.g., if issue_location: location_str = str(issue_location);
use location_str for classification but call _hash_identifier(location_str) only
when issue_location is truthy, otherwise set location_identifier/model_name to
None). Apply the same fix to the other block around the references to
_hash_identifier/_classify_path/_extract_model_name noted at 572-576.

In `@tests/test_telemetry.py`:
- Around line 370-372: Add a return type annotation "-> None" to the test method
signature for test_path_and_url_fields_use_hashed_identifiers_only so the method
reads "def test_path_and_url_fields_use_hashed_identifiers_only(self) -> None:";
update only the function signature to include the annotation to satisfy the
repository typing rules for tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6352dee5-7c86-42e4-86fb-b473b01282c1

📥 Commits

Reviewing files that changed from the base of the PR and between d9fe283 and 26588e7.

📒 Files selected for processing (3)

CHANGELOG.md
modelaudit/telemetry.py
tests/test_telemetry.py

coderabbitai · 2026-03-16T21:25:20Z

modelaudit/telemetry.py

                issue_location = str(issue.get("location", ""))
                issue_details.append(
                    {
                        "type": issue_type,
                        "severity": severity,
                        "location_type": self._classify_path(issue_location),
-                        "file_path": issue_location,
+                        "location_identifier": self._hash_identifier(issue_location) if issue_location else None,
                        "model_name": self._extract_model_name(issue_location) if issue_location else None,


⚠️ Potential issue | 🟠 Major

Avoid hashing "None" for missing locations/paths.

Current str(...) coercion turns None into "None" (truthy), so missing location/path values get a synthetic identifier instead of being omitted.

🔧 Proposed fix

- issue_location = str(issue.get("location", "")) + raw_issue_location = issue.get("location") + issue_location = raw_issue_location if isinstance(raw_issue_location, str) else "" issue_details.append( { "type": issue_type, "severity": severity, "location_type": self._classify_path(issue_location), "location_identifier": self._hash_identifier(issue_location) if issue_location else None, "model_name": self._extract_model_name(issue_location) if issue_location else None, } ) @@ - "file_identifiers": [ - self._hash_identifier(asset_path) - for asset in assets - if (asset_path := str(asset.get("path", ""))) - ], + "file_identifiers": [ + self._hash_identifier(asset_path) + for asset in assets + if isinstance((asset_path := asset.get("path")), str) and asset_path + ],

Also applies to: 572-576

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelaudit/telemetry.py` around lines 544 - 551, The code coerces issue.get("location", "") to str, turning None into the string "None" and causing _hash_identifier to produce synthetic IDs; change issue_location to keep the original value (e.g., issue_location = issue.get("location")) and only convert to string or call _hash_identifier / _extract_model_name / _classify_path when the value is non-null/non-empty (e.g., if issue_location: location_str = str(issue_location); use location_str for classification but call _hash_identifier(location_str) only when issue_location is truthy, otherwise set location_identifier/model_name to None). Apply the same fix to the other block around the references to _hash_identifier/_classify_path/_extract_model_name noted at 572-576.

coderabbitai · 2026-03-16T21:25:20Z

tests/test_telemetry.py

+    def test_path_and_url_fields_use_hashed_identifiers_only(self):
+        """Path and URL fields should not include raw sensitive values."""
        mock_posthog = MagicMock()


🛠️ Refactor suggestion | 🟠 Major

Add -> None return annotation on this test method.

This keeps test typing consistent with repository rules.

✏️ Proposed fix

- def test_path_and_url_fields_use_hashed_identifiers_only(self): + def test_path_and_url_fields_use_hashed_identifiers_only(self) -> None:

As per coding guidelines: "tests/**/*.py: Use type hints -> None on all test methods and tmp_path: Path / monkeypatch: pytest.MonkeyPatch on test parameters".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_telemetry.py` around lines 370 - 372, Add a return type annotation "-> None" to the test method signature for test_path_and_url_fields_use_hashed_identifiers_only so the method reads "def test_path_and_url_fields_use_hashed_identifiers_only(self) -> None:"; update only the function signature to include the annotation to satisfy the repository typing rules for tests.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_telemetry.py`:
- Around line 273-274: The test method
test_scan_completed_uses_top_level_results_schema is missing a return type
annotation; update its definition to include "-> None" (i.e., change the
function signature for test_scan_completed_uses_top_level_results_schema to
declare a None return type) to comply with the tests/**/*.py typing guideline.
- Around line 335-336: The test method definition
test_scan_started_includes_per_path_fields is missing a return type annotation;
update its signature to include the explicit return annotation "-> None" (i.e.,
change "def test_scan_started_includes_per_path_fields(self):" to "def
test_scan_started_includes_per_path_fields(self) -> None:") so the test follows
the project's typing guideline for tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c05982f2-472b-459c-82de-24148c75fb41

📥 Commits

Reviewing files that changed from the base of the PR and between 26588e7 and 1d911a5.

📒 Files selected for processing (3)

CHANGELOG.md
modelaudit/telemetry.py
tests/test_telemetry.py

coderabbitai · 2026-03-16T21:29:36Z

tests/test_telemetry.py

    def test_scan_completed_uses_top_level_results_schema(self):
-        """Test that scan completion telemetry aggregates from top-level issues/assets."""
+        """Test that scan completion telemetry aggregates without file identifiers."""


🧹 Nitpick | 🔵 Trivial

Add -> None return annotation on this test method.

- def test_scan_completed_uses_top_level_results_schema(self): + def test_scan_completed_uses_top_level_results_schema(self) -> None:

As per coding guidelines: "tests/**/*.py: Use type hints -> None on all test methods".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_scan_completed_uses_top_level_results_schema(self):

"""Test that scan completion telemetry aggregates from top-level issues/assets."""

"""Test that scan completion telemetry aggregates without file identifiers."""

def test_scan_completed_uses_top_level_results_schema(self) -> None:

"""Test that scan completion telemetry aggregates without file identifiers."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_telemetry.py` around lines 273 - 274, The test method test_scan_completed_uses_top_level_results_schema is missing a return type annotation; update its definition to include "-> None" (i.e., change the function signature for test_scan_completed_uses_top_level_results_schema to declare a None return type) to comply with the tests/**/*.py typing guideline.

coderabbitai · 2026-03-16T21:29:36Z

tests/test_telemetry.py

    def test_scan_started_includes_per_path_fields(self):
-        """Scan started events include per-path arrays and model names."""
+        """Scan started events include coarse path metadata but no identifiers."""


🧹 Nitpick | 🔵 Trivial

Add -> None return annotation on this test method.

- def test_scan_started_includes_per_path_fields(self): + def test_scan_started_includes_per_path_fields(self) -> None:

As per coding guidelines: "tests/**/*.py: Use type hints -> None on all test methods".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_scan_started_includes_per_path_fields(self):

"""Scan started events include per-path arrays and model names."""

"""Scan started events include coarse path metadata but no identifiers."""

def test_scan_started_includes_per_path_fields(self) -> None:

"""Scan started events include coarse path metadata but no identifiers."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_telemetry.py` around lines 335 - 336, The test method definition test_scan_started_includes_per_path_fields is missing a return type annotation; update its signature to include the explicit return annotation "-> None" (i.e., change "def test_scan_started_includes_per_path_fields(self):" to "def test_scan_started_includes_per_path_fields(self) -> None:") so the test follows the project's typing guideline for tests.

# Conflicts: # CHANGELOG.md

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 84: Update the CHANGELOG entry that currently reads "- **telemetry:**
simplify telemetry payloads to omit file path identifiers" to accurately reflect
the implementation: if the code replaces paths with hashed values, change the
wording to "hash file path identifiers in telemetry payloads" or "replace raw
file path identifiers with hashed values"; if the code actually removes them,
keep "omit file path identifiers"; ensure the changelog text matches the
implementation and mentions "telemetry" and "file path identifiers" so it's easy
to locate and verify.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9743f77e-460c-423f-944e-a75319921bd0

📥 Commits

Reviewing files that changed from the base of the PR and between 1d911a5 and 8f68169.

📒 Files selected for processing (1)

CHANGELOG.md

coderabbitai · 2026-03-18T07:07:22Z

CHANGELOG.md


 ### Fixed

+- **telemetry:** simplify telemetry payloads to omit file path identifiers


⚠️ Potential issue | 🟡 Minor

Clarify whether file paths are hashed or omitted.

The changelog entry states that file path identifiers are "omitted," but the PR title says "hash file paths in payloads." These describe different approaches:

Hashing means raw values are replaced with hashed identifiers (data is transformed but still present)

Omitting means the values are completely removed (data is absent)

For clarity and accuracy, align the changelog wording with what the implementation actually does. If paths are being hashed, consider: "replace raw file path identifiers with hashed values" or "hash file path identifiers in telemetry payloads."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 84, Update the CHANGELOG entry that currently reads "- **telemetry:** simplify telemetry payloads to omit file path identifiers" to accurately reflect the implementation: if the code replaces paths with hashed values, change the wording to "hash file path identifiers in telemetry payloads" or "replace raw file path identifiers with hashed values"; if the code actually removes them, keep "omit file path identifiers"; ensure the changelog text matches the implementation and mentions "telemetry" and "file path identifiers" so it's easy to locate and verify.

fix: hash telemetry file paths

26588e7

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

fix: remove telemetry path identifiers

1d911a5

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into audit/pr717

8f68169

# Conflicts: # CHANGELOG.md

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(telemetry): hash file paths in payloads#717

fix(telemetry): hash file paths in payloads#717
typpo wants to merge 3 commits intomainfrom
codex/telemetry-file-hashes

typpo commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Fixed

		- telemetry: simplify telemetry payloads to omit file path identifiers

Conversation

typpo commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

typpo commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading