Skip to content

feat: local telemetry storage#8400

Merged
fatih-acar merged 2 commits intorelease-1.9from
fac-001-local-telemetry-storage
Apr 14, 2026
Merged

feat: local telemetry storage#8400
fatih-acar merged 2 commits intorelease-1.9from
fac-001-local-telemetry-storage

Conversation

@fatih-acar
Copy link
Copy Markdown
Contributor

  • I have reviewed AI generated content

As per the quickstart.md file:

What This Feature Does

Infrahub now stores a daily telemetry snapshot locally in the database, regardless of whether remote telemetry reporting is enabled. This ensures all deployments — including air-gapped and opted-out — retain usage data for support, auditing, and license compliance.

Key Changes

  1. Automatic daily storage: The daily telemetry workflow now always stores a snapshot locally before optionally sending to the remote endpoint.
  2. CLI export: infrahubctl telemetry export exports snapshots to a JSON file.
  3. CLI listing: infrahubctl telemetry list shows a summary table of stored snapshots.
  4. REST API: GET /api/telemetry/snapshots provides programmatic access.
  5. Permission: Access requires READ_TELEMETRY global permission.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (4)
  • main
  • stable
  • develop
  • release-.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- `StandardNode.to_db()` automatically serializes Pydantic models to JSON via `model_dump_json()` and deserializes them in `from_db()` via `ujson.loads()`. This is proven by existing test fixtures (`PadanticStdNode` in test_node_standard.py).
- The telemetry payload is ~3-5 KB of JSON. Storing as a single JSON string avoids fragmenting it across multiple Neo4j nodes/relationships.
- The `TelemetryData` model is already well-typed with nested Pydantic models (`TelemetryWorkerData`, `TelemetryBranchData`, `TelemetrySchemaData`, `TelemetryDatabaseData`, `TelemetryPrefectData`).
- Using the existing model type in the StandardNode field enables automatic Pydantic validation on read.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be careful with that, overtime the format of this data will change and we'll have old data in the database, we need to make sure that we can still read and export the old data, not just the latest one

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... We should probably create a TelemetryData Pydantic class for each format version, so that it gets correctly (de)serialized. I will add to the plan.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the spec: instead of versioned Pydantic classes, the data field is now stored as plain dict[str, Any] (JSON). The TelemetryData Pydantic model is still used at collection time for validation, then model.model_dump() converts it to a plain dict before storage. On read, data comes back as a raw dict — no deserialization needed. The payload_format field identifies which version of the schema produced the data, so consumers can interpret differences if needed. This keeps old snapshots always readable regardless of how the model evolves.


---

### R-006: REST API Design — Programmatic Interface
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clearly mentioned but I think we should have a sane default limit for this API to avoid returning everything each time and potentially having a pagination system would be helpful to avoid returning 100s of responses at once

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination system will be implemented as described in contracts/rest-api.md‎

@fatih-acar fatih-acar force-pushed the fac-speckit-branch-naming branch from bfd456b to bcf3264 Compare February 17, 2026 11:42
Base automatically changed from fac-speckit-branch-naming to develop February 18, 2026 09:34
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 5, 2026

This PR has been inactive for 2 weeks and has been marked as stale.

If you're still working on this, please:

  • Push new commits, or
  • Leave a comment to indicate this PR is still active

The stale label will be removed automatically when there's new activity.

@github-actions github-actions Bot added the stale Marks stale issues and pull requests label Mar 5, 2026
@fatih-acar fatih-acar force-pushed the fac-001-local-telemetry-storage branch from dfc2fc4 to bbbfdd4 Compare April 1, 2026 14:21
@fatih-acar fatih-acar requested review from a team as code owners April 1, 2026 14:21
@fatih-acar fatih-acar changed the base branch from develop to stable April 1, 2026 14:22
Copy link
Copy Markdown
Contributor

@ajtmccarty ajtmccarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good to me

payload_format: str # Format version
deployment_id: str # Deployment identifier
infrahub_version: str # Product version
data: dict[str, Any] # Full telemetry payload
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice for this to be structured data too, but not sure if it really matters in this case
I see the research section explains why the data is unstructured: to be backwards compatible with old payload formats when updating the format going forward

Comment thread dev/specs/fac-001-local-telemetry-storage/contracts/rest-api.md Outdated
Comment thread dev/specs/fac-001-local-telemetry-storage/data-model.md Outdated
@fatih-acar fatih-acar force-pushed the fac-001-local-telemetry-storage branch from bbbfdd4 to 79d9576 Compare April 2, 2026 08:44
@fatih-acar fatih-acar requested a review from a team as a code owner April 2, 2026 08:44
@fatih-acar fatih-acar changed the base branch from stable to develop April 2, 2026 08:44
@fatih-acar fatih-acar marked this pull request as draft April 2, 2026 08:45
@fatih-acar fatih-acar changed the title feat: local telemetry storage (specs and plan) feat: local telemetry storage Apr 2, 2026
@fatih-acar
Copy link
Copy Markdown
Contributor Author

Thanks for the review @ajtmccarty I've addressed your comments and pushed the actual implementation
I'll do some manual testing

@github-actions github-actions Bot added type/documentation Improvements or additions to documentation group/backend Issue related to the backend (API Server, Git Agent) type/spec A specification for an upcoming change to the project labels Apr 2, 2026
@fatih-acar fatih-acar force-pushed the fac-001-local-telemetry-storage branch from f1bed15 to bbc2784 Compare April 2, 2026 08:54
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 2, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing fac-001-local-telemetry-storage (6151156) with release-1.9 (d34952a)

Open in CodSpeed

@github-actions github-actions Bot removed the stale Marks stale issues and pull requests label Apr 3, 2026
@fatih-acar fatih-acar force-pushed the fac-001-local-telemetry-storage branch from bbc2784 to 2a2c697 Compare April 10, 2026 08:52
@fatih-acar fatih-acar changed the base branch from develop to release-1.9 April 10, 2026 08:53
@fatih-acar fatih-acar force-pushed the fac-001-local-telemetry-storage branch 2 times, most recently from 6adb50c to 5a964a9 Compare April 10, 2026 09:19
@github-actions github-actions Bot added the group/frontend Issue related to the frontend (React) label Apr 10, 2026
@fatih-acar fatih-acar marked this pull request as ready for review April 10, 2026 09:51
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 23 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="dev/specs/fac-001-local-telemetry-storage/contracts/rest-api.md">

<violation number="1" location="dev/specs/fac-001-local-telemetry-storage/contracts/rest-api.md:75">
P2: Type `remote_send_status` as `Literal["sent", "skipped", "failed", "pending"]` (or a `StrEnum`) instead of plain `str`. This gives Pydantic validation and self-documents the allowed values in the contract.

(Based on your team's feedback about preferring enums and typed values over literals.) [FEEDBACK_USED]</violation>

<violation number="2" location="dev/specs/fac-001-local-telemetry-storage/contracts/rest-api.md:121">
P1: The `export` command lacks `--limit`/`--offset` options and there's no mention of automatic pagination. Since the REST API defaults to `limit=1000`, an export from a long-running deployment would silently truncate results. Either add explicit pagination options (matching `telemetry list`) or specify that the command auto-paginates through all results.</violation>
</file>

<file name="backend/infrahub/telemetry/tasks.py">

<violation number="1" location="backend/infrahub/telemetry/tasks.py:126">
P2: `database` is only assigned inside the first `try` block. If `get_database()` fails, subsequent blocks reference an unbound `database` variable. Move `database = await get_database()` before the first `try`, or re-obtain it in each block that needs it, so the variable is always defined when used.</violation>
</file>

<file name="backend/infrahub/api/telemetry.py">

<violation number="1" location="backend/infrahub/api/telemetry.py:83">
P2: `count` returns the current page size, not the total number of matching snapshots. For a paginated endpoint (with `limit`/`offset`), this makes it impossible for clients to determine whether more pages exist. Either rename this to clarify it's the page size (e.g., `returned`), or add a total-count query and return the true total here.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread dev/specs/infp-471-local-telemetry-storage/contracts/rest-api.md
Comment thread dev/specs/fac-001-local-telemetry-storage/contracts/rest-api.md Outdated
Comment thread backend/infrahub/telemetry/tasks.py Outdated
Comment thread backend/infrahub/api/telemetry.py Outdated
Copy link
Copy Markdown
Contributor

@polmichel polmichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Just a comment worth really looking into regarding the end_date bound in the query when querying "today's data

Comment thread backend/tests/unit/telemetry/test_workflow.py Outdated
Comment thread backend/infrahub/telemetry/tasks.py Outdated
Comment thread backend/infrahub/telemetry/snapshot.py Outdated
Comment thread backend/infrahub/telemetry/constants.py Outdated
Comment thread backend/infrahub/telemetry/snapshot.py Outdated
Comment thread backend/infrahub/telemetry/snapshot.py Outdated
Comment thread backend/infrahub/telemetry/snapshot.py Outdated
Comment thread backend/infrahub/api/telemetry.py Outdated
Comment thread backend/infrahub/telemetry/tasks.py Outdated
@@ -96,21 +104,64 @@ async def post_telemetry_data(url: str, payload: dict[str, Any]) -> None:
@flow(name="anonymous_telemetry_send", flow_run_name="Send anonymous telemetry")
async def send_telemetry_push() -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just on my dependency injection soapbox yesterday for a different PR making some changes to a flow and I'm going to do it again

to be more SOLID (particularly the D for dependency injection), this flow should just instantiate a new TelemetrySnapshotService that encapsulates all of the below logic. something like this

async def send_telemetry_push() -> None:
    telemetry_repository = TelemetryRepository(db=db)
    telemetry_service = TelemetrySnapshotService(
        db=db, telemetry_repository=telemetry_repository,
    )

    data = await gather_anonymous_telemetry_data()
    await telemetry_service.save_and_send(data)

and then in your tests, you don't have to patch a bunch of stuff. you can create mocks using telemetry_repo = AsyncMock(spec=TelemetryRepository) or create_autospec(TelemetryRepository, spec_set=True) and you can assert they are called with telemetry_repo.assert_awaited_once_with() or telemetry_repo.assert_has_awaits() or another method on the mock like this

I think it improves tests (accurate interfaces for the mocks instead of "anything goes") and makes them easier to read (no more patch context managers)

that said, this would probably require some more refactoring to turn gather_anonymous_telemetry_data into TelemetryDataGatherer and post_telemetry_data into TelemetryDataPoster to get the full benefits, but something to consider and an LLM can probably make pretty quick work of it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I'll try to tackle that refactor in another PR (to not block 1.9)

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/infrahub/telemetry/tasks.py">

<violation number="1" location="backend/infrahub/telemetry/tasks.py:127">
P1: Unhandled exception: `repository.save()` on the opt-out path lost its `try/except` during refactoring. The other two `repository.save()` calls in this function are protected, but a DB failure here will crash the flow. Wrap this in the same error-handling pattern used by the adjacent save calls.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread backend/infrahub/telemetry/tasks.py
@fatih-acar fatih-acar force-pushed the fac-001-local-telemetry-storage branch from 9187fae to 37c3533 Compare April 13, 2026 17:37
Comment thread backend/tests/unit/telemetry/test_workflow.py Outdated
Signed-off-by: Fatih Acar <fatih@opsmill.com>
Signed-off-by: Fatih Acar <fatih@opsmill.com>
@fatih-acar fatih-acar force-pushed the fac-001-local-telemetry-storage branch 2 times, most recently from c7116a4 to 6151156 Compare April 14, 2026 11:28
@fatih-acar fatih-acar merged commit d9a0a55 into release-1.9 Apr 14, 2026
47 checks passed
@fatih-acar fatih-acar deleted the fac-001-local-telemetry-storage branch April 14, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent) group/frontend Issue related to the frontend (React) type/documentation Improvements or additions to documentation type/spec A specification for an upcoming change to the project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants