Skip to content

feat(telemetry): attribute machine-identity CLI events to identity-<id> persons#197

Open
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1777325070-cli-machine-identity-telemetry
Open

feat(telemetry): attribute machine-identity CLI events to identity-<id> persons#197
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1777325070-cli-machine-identity-telemetry

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 27, 2026

Description 📣

Follow-up to #146 and #196. CLI events captured while running with a machine-identity access token (e.g. CI runners using INFISICAL_TOKEN or INFISICAL_UNIVERSAL_AUTH_ACCESS_TOKEN) currently land under anonymous_cli_<machineId> in PostHog. Because ephemeral containers get a fresh machineId per spin-up, each CI run produces its own anonymous person — and this is the dominant source of anonymous_cli_* person inflation in PostHog.

The backend already tracks machine identities under distinctId = "identity-<identityId>" and enriches the person with name = "[Machine Identity] <name>" and actorType = "identity" on every MachineIdentityLogin and identity-scoped event — see backend/src/services/telemetry/telemetry-service.ts:identifyIdentity and the 11 identity auth routers (Universal Auth, K8s, AWS IAM, GCP, Azure, JWT, OIDC, LDAP, OCI, Alicloud, TLS Cert). The CLI just needs to emit events with the matching distinctId and they'll flow into the existing person — no Identify call from the CLI is required because the backend has already created the record.

This PR adds getMachineIdentityIdFromEnv() in packages/telemetry/telemetry.go, which:

  • Inspects the same env-var precedence as util.GetInfisicalToken: INFISICAL_UNIVERSAL_AUTH_ACCESS_TOKENINFISICAL_TOKENTOKEN (the --token flag is per-command and not accessible from the telemetry layer; env vars cover the dominant CI use case).
  • Skips service tokens (st. prefix — deprecated, no JWT payload).
  • Decodes the JWT payload (no signature verification — the value is only used to derive a distinctId, never for authorization, and the same token is signature-verified on the backend when the API call is made).
  • Extracts and returns the identityId claim.

GetDistinctId() is then updated to a three-tier resolution:

  1. Logged-in user email from config → email
  2. Machine-identity from env → identity-<identityId> (new)
  3. Anonymous fallback → anonymous_cli_<machineId>

Logged-in user wins over env-token identity (updated after review): some commands (infisical user switch, the local-config branch of infisical login) never authenticate against the backend, and the interactive infisical login flow has just enriched the email person with Identify/Alias. Attributing those events to a stale identity-<id> from an env var would split person-level analytics. The env-token branch only fires when LoggedInUserEmail == "", which is the dominant CI / container / K8s case the PR is targeting and the only state where the CLI has no other actor to attribute telemetry to.

This is upgrade-gated: existing anonymous_cli_* person records are not retroactively reassigned, but new events from upgraded CLIs route to identity-<id> and stop generating new anonymous persons. Combined with #196 (lazy IdentifyUser for users who logged in pre-v0.43.59), this closes the two largest sources of CLI person-attribution gaps in PostHog.

Type ✨

  • Bug fix
  • New feature
  • Improvement
  • Breaking change
  • Documentation

Tests 🛠️

Manual reasoning-through of the resolution table:

LoggedInUserEmail Env state Resolved distinctId
a@x.com INFISICAL_TOKEN set to UA JWT (identityId=abc) a@x.com (logged-in user wins, prevents misattribution for user switch / login)
a@x.com INFISICAL_TOKEN set to service token (st.…) a@x.com (unchanged)
a@x.com INFISICAL_TOKEN set to malformed value a@x.com (unchanged)
a@x.com nothing set a@x.com (unchanged)
"" INFISICAL_TOKEN set to UA JWT (identityId=abc) identity-abc (new behavior, the dominant CI case)
"" INFISICAL_UNIVERSAL_AUTH_ACCESS_TOKEN set, INFISICAL_TOKEN also set UA env wins (matches util.GetInfisicalToken precedence)
"" INFISICAL_TOKEN set to service token (st.…) anonymous_cli_<machineId> (service tokens skip the new branch)
"" nothing set anonymous_cli_<machineId> (unchanged)

Build verified locally:

go build ./...
# Pre-existing run.go vet warnings are unrelated and present on main.

No new dependencies are introduced — JWT payload extraction uses only encoding/base64, encoding/json, and strings from the Go standard library, and the JWT signature is intentionally not verified (this is for telemetry attribution, not auth).


Link to Devin session: https://app.devin.ai/sessions/6363c6181d1641f8a564bc8161e4270f
Requested by: @0xArshdeep


Open in Devin Review

…d> persons

When the CLI runs with a machine-identity access token (INFISICAL_TOKEN
or INFISICAL_UNIVERSAL_AUTH_ACCESS_TOKEN env var), every cli-command:*
event today is captured under `anonymous_cli_<machineId>`. For CI
runners and ephemeral containers — where the machineId is fresh per
container — this means each container shows up as a unique anonymous
person in PostHog, which is the dominant source of `anonymous_cli_*`
person inflation.

The backend already tracks machine identities under the distinctId
`identity-<identityId>` and enriches the person record with a
`[Machine Identity] <name>` display name and `actorType: identity`
on every MachineIdentityLogin and identity-scoped event (see
backend/src/services/telemetry/telemetry-service.ts:identifyIdentity
and the 11 identity auth routers). The CLI just needs to use the
matching distinctId so its events land on the same person record;
no Identify call from the CLI is needed because the backend has
already created the person.

Add getMachineIdentityIdFromEnv() to inspect the same env-var
precedence as util.GetInfisicalToken (--token flag is per-command
and not accessible from the telemetry layer) and decode the
unverified JWT payload to extract the `identityId` claim. Use
"identity-" + identityId as the distinctId in GetDistinctId(),
ahead of the user-email branch — when both are present, the CLI
authenticates as the machine identity, so telemetry should follow
the same attribution.

Service tokens (`st.` prefix) and malformed JWTs fall through to
the existing email/anonymous resolution. JWT signatures are not
verified — the value is only used to derive a distinctId, never for
authorization, and the same token is signature-verified on the
backend when the API call is made.

This change is upgrade-gated: existing `anonymous_cli_*` person
records are not retroactively reassigned, but new events from
upgraded CLIs will route to `identity-<id>` and stop generating
new anonymous persons.
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca54d7af38

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/telemetry/telemetry.go Outdated
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, well-scoped telemetry attribution change with no security-sensitive surface; the single nit found is a minor edge case.

Extended reasoning...

Overview

The PR touches a single file (packages/telemetry/telemetry.go) to add a new getMachineIdentityIdFromEnv() helper and rewire GetDistinctId() into a three-tier resolution: machine-identity-from-env → logged-in-user email → anonymous anonymous_cli_<machineId> fallback. The motivation (CI runners producing fresh anonymous_cli_* persons per ephemeral container, inflating PostHog person counts) is documented clearly in the PR description and matches the inline comments.

Security risks

Minimal. The JWT payload is decoded with base64.RawURLEncoding / base64.URLEncoding and unmarshalled into a single string field (identityId); the signature is intentionally not verified, but the value is only used to construct a PostHog distinctId string — it is never used for authorization, and is concatenated with a fixed identity- prefix rather than interpolated into a request path or query. Failure modes are silent (return ""), which is the right behavior for a best-effort telemetry helper. No new dependencies are introduced.

Level of scrutiny

Low-to-moderate. This is a CLI telemetry attribution change, not auth, crypto, or permissions code. It uses only Go standard library, the diff is ~80 lines including comments, and the resolution table in the PR description covers the relevant edge cases (UA token, service token st. prefix, malformed token, missing token, env-var precedence). The new precedence (machine-identity beats logged-in user) is explicitly motivated in both the PR body and the inline comment in GetDistinctId().

Other factors

The single bug-hunter finding is a [Nit] about the cli-command:login event being attributed to identity-<id> when an env-token happens to be exported during an interactive infisical login — a low-impact edge case that requires an unusual setup (UA env-token exported in a developer's interactive shell during login), and only affects one event per such invocation. Easily addressed in a follow-up if desired; not a blocker for this PR.

Comment thread packages/telemetry/telemetry.go Outdated
…tinctId resolution

Addresses Codex P2 and Claude review nit on #197. The original
precedence (env-token identity > logged-in email) misattributes
telemetry for two real flows:

1. `infisical user switch` — pure local-config command that never
   authenticates against the backend. With INFISICAL_TOKEN exported
   in the shell, every cli-command:* event from a logged-in user
   would land on identity-<id> from a token the command never used.

2. `infisical login` (interactive user flow) — the CLI just persisted
   the user's email, called IdentifyUser/Alias on the email person,
   and then captures cli-command:login. With INFISICAL_TOKEN exported,
   the capture event would land on identity-<id> while the Identify+
   Alias enriched the email person, splitting the login-flow signal
   across two person records.

Flip the precedence: logged-in email wins when present, env-token
identity is consulted only when no user is logged in. This preserves
the PR's primary goal — CI / containers / K8s pods (no logged-in
user, INFISICAL_TOKEN set) still attribute to identity-<id> instead
of anonymous_cli_<machineId> — while keeping all logged-in flows
(switch, login, day-to-day commands) attributing to the user.

Resolution table after this change:

  LoggedInUserEmail  INFISICAL_TOKEN (UA JWT)  Resolved distinctId
  --------------------------------------------------------------
  set (a@x.com)      set                       a@x.com
  set (a@x.com)      unset                     a@x.com
  unset              set                       identity-<id>   (the goal)
  unset              unset                     anonymous_cli_<machineId>
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 IdentifyUser does not alias the new identity-<id> distinctId to the user's email

IdentifyUser (line 68) only creates a PostHog alias from anonymous_cli_ + machineId → email. After this PR, if a machine-identity token is present in the environment (e.g. INFISICAL_TOKEN), GetDistinctId() returns identity-<id> instead of anonymous_cli_<machineId> for pre-login events. When the user later logs in interactively, IdentifyUser aliases the now-unused anonymous_cli_<machineId> to their email but never aliases identity-<id>. This means any CLI telemetry events captured before login (under the identity-<id> distinctId) will remain orphaned in PostHog and won't be merged into the logged-in person record, which is the exact scenario IdentifyUser was designed to handle.

(Refers to lines 82-89)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Re: the new Devin Review finding on aliasing identity-<id> → email in IdentifyUser — pushing back on this one, I think it's a false positive given the actor-vs-person semantics.

The scenario it describes is narrow. With the post-review precedence flip (logged-in email beats env-token identity), pre-login events can only land under identity-<id> when there's no logged-in user, INFISICAL_TOKEN is exported, the user runs telemetry-emitting commands, then later runs infisical login interactively on the same machine. That's a CI-runner flow inverting itself into an interactive human flow on the same hardware — uncommon outside "developer testing CI on their laptop."

Why aliasing would be wrong even if it weren't narrow. identity-<id> isn't anonymous. It's the canonical PostHog distinctId for a machine-identity actor — the backend creates and enriches that person on MachineIdentityLogin with name = "[Machine Identity] <name>" and actorType = "identity" (see backend/src/services/telemetry/telemetry-service.ts:identifyIdentity), and emits identity-scoped events under it from 11 different auth routers. Aliasing identity-<id>user@example.com would merge that machine-identity person with the human person — corrupting any dashboard segmented by actorType, breaking funnels built on identity activity, and producing a person record that's neither cleanly an identity nor cleanly a human.

The anonymous_cli_<machineId> → email alias works because the anonymous distinctId genuinely means "we don't know who this is yet" — it's a placeholder. identity-<id> is a known, named, distinct actor and shouldn't be collapsed into a human's record because they happened to log in on the same shell afterward.

The pre-login identity-<id> events were authentic machine-identity API calls (those secrets were fetched as the identity, not as the user), so they correctly belong to the identity person. Leaving them there is the right behavior. Holding off on changing the code unless there's an explicit use case for cross-actor merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants