feat(telemetry): attribute machine-identity CLI events to identity-<id> persons#197
Conversation
…d> persons When the CLI runs with a machine-identity access token (INFISICAL_TOKEN or INFISICAL_UNIVERSAL_AUTH_ACCESS_TOKEN env var), every cli-command:* event today is captured under `anonymous_cli_<machineId>`. For CI runners and ephemeral containers — where the machineId is fresh per container — this means each container shows up as a unique anonymous person in PostHog, which is the dominant source of `anonymous_cli_*` person inflation. The backend already tracks machine identities under the distinctId `identity-<identityId>` and enriches the person record with a `[Machine Identity] <name>` display name and `actorType: identity` on every MachineIdentityLogin and identity-scoped event (see backend/src/services/telemetry/telemetry-service.ts:identifyIdentity and the 11 identity auth routers). The CLI just needs to use the matching distinctId so its events land on the same person record; no Identify call from the CLI is needed because the backend has already created the person. Add getMachineIdentityIdFromEnv() to inspect the same env-var precedence as util.GetInfisicalToken (--token flag is per-command and not accessible from the telemetry layer) and decode the unverified JWT payload to extract the `identityId` claim. Use "identity-" + identityId as the distinctId in GetDistinctId(), ahead of the user-email branch — when both are present, the CLI authenticates as the machine identity, so telemetry should follow the same attribution. Service tokens (`st.` prefix) and malformed JWTs fall through to the existing email/anonymous resolution. JWT signatures are not verified — the value is only used to derive a distinctId, never for authorization, and the same token is signature-verified on the backend when the API call is made. This change is upgrade-gated: existing `anonymous_cli_*` person records are not retroactively reassigned, but new events from upgraded CLIs will route to `identity-<id>` and stop generating new anonymous persons.
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca54d7af38
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
LGTM — small, well-scoped telemetry attribution change with no security-sensitive surface; the single nit found is a minor edge case.
Extended reasoning...
Overview
The PR touches a single file (packages/telemetry/telemetry.go) to add a new getMachineIdentityIdFromEnv() helper and rewire GetDistinctId() into a three-tier resolution: machine-identity-from-env → logged-in-user email → anonymous anonymous_cli_<machineId> fallback. The motivation (CI runners producing fresh anonymous_cli_* persons per ephemeral container, inflating PostHog person counts) is documented clearly in the PR description and matches the inline comments.
Security risks
Minimal. The JWT payload is decoded with base64.RawURLEncoding / base64.URLEncoding and unmarshalled into a single string field (identityId); the signature is intentionally not verified, but the value is only used to construct a PostHog distinctId string — it is never used for authorization, and is concatenated with a fixed identity- prefix rather than interpolated into a request path or query. Failure modes are silent (return ""), which is the right behavior for a best-effort telemetry helper. No new dependencies are introduced.
Level of scrutiny
Low-to-moderate. This is a CLI telemetry attribution change, not auth, crypto, or permissions code. It uses only Go standard library, the diff is ~80 lines including comments, and the resolution table in the PR description covers the relevant edge cases (UA token, service token st. prefix, malformed token, missing token, env-var precedence). The new precedence (machine-identity beats logged-in user) is explicitly motivated in both the PR body and the inline comment in GetDistinctId().
Other factors
The single bug-hunter finding is a [Nit] about the cli-command:login event being attributed to identity-<id> when an env-token happens to be exported during an interactive infisical login — a low-impact edge case that requires an unusual setup (UA env-token exported in a developer's interactive shell during login), and only affects one event per such invocation. Easily addressed in a follow-up if desired; not a blocker for this PR.
…tinctId resolution Addresses Codex P2 and Claude review nit on #197. The original precedence (env-token identity > logged-in email) misattributes telemetry for two real flows: 1. `infisical user switch` — pure local-config command that never authenticates against the backend. With INFISICAL_TOKEN exported in the shell, every cli-command:* event from a logged-in user would land on identity-<id> from a token the command never used. 2. `infisical login` (interactive user flow) — the CLI just persisted the user's email, called IdentifyUser/Alias on the email person, and then captures cli-command:login. With INFISICAL_TOKEN exported, the capture event would land on identity-<id> while the Identify+ Alias enriched the email person, splitting the login-flow signal across two person records. Flip the precedence: logged-in email wins when present, env-token identity is consulted only when no user is logged in. This preserves the PR's primary goal — CI / containers / K8s pods (no logged-in user, INFISICAL_TOKEN set) still attribute to identity-<id> instead of anonymous_cli_<machineId> — while keeping all logged-in flows (switch, login, day-to-day commands) attributing to the user. Resolution table after this change: LoggedInUserEmail INFISICAL_TOKEN (UA JWT) Resolved distinctId -------------------------------------------------------------- set (a@x.com) set a@x.com set (a@x.com) unset a@x.com unset set identity-<id> (the goal) unset unset anonymous_cli_<machineId>
There was a problem hiding this comment.
🟡 IdentifyUser does not alias the new identity-<id> distinctId to the user's email
IdentifyUser (line 68) only creates a PostHog alias from anonymous_cli_ + machineId → email. After this PR, if a machine-identity token is present in the environment (e.g. INFISICAL_TOKEN), GetDistinctId() returns identity-<id> instead of anonymous_cli_<machineId> for pre-login events. When the user later logs in interactively, IdentifyUser aliases the now-unused anonymous_cli_<machineId> to their email but never aliases identity-<id>. This means any CLI telemetry events captured before login (under the identity-<id> distinctId) will remain orphaned in PostHog and won't be merged into the logged-in person record, which is the exact scenario IdentifyUser was designed to handle.
(Refers to lines 82-89)
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Re: the new Devin Review finding on aliasing The scenario it describes is narrow. With the post-review precedence flip (logged-in email beats env-token identity), pre-login events can only land under Why aliasing would be wrong even if it weren't narrow. The The pre-login |
Description 📣
Follow-up to #146 and #196. CLI events captured while running with a machine-identity access token (e.g. CI runners using
INFISICAL_TOKENorINFISICAL_UNIVERSAL_AUTH_ACCESS_TOKEN) currently land underanonymous_cli_<machineId>in PostHog. Because ephemeral containers get a freshmachineIdper spin-up, each CI run produces its own anonymous person — and this is the dominant source ofanonymous_cli_*person inflation in PostHog.The backend already tracks machine identities under
distinctId = "identity-<identityId>"and enriches the person withname = "[Machine Identity] <name>"andactorType = "identity"on everyMachineIdentityLoginand identity-scoped event — seebackend/src/services/telemetry/telemetry-service.ts:identifyIdentityand the 11 identity auth routers (Universal Auth, K8s, AWS IAM, GCP, Azure, JWT, OIDC, LDAP, OCI, Alicloud, TLS Cert). The CLI just needs to emit events with the matchingdistinctIdand they'll flow into the existing person — no Identify call from the CLI is required because the backend has already created the record.This PR adds
getMachineIdentityIdFromEnv()inpackages/telemetry/telemetry.go, which:util.GetInfisicalToken:INFISICAL_UNIVERSAL_AUTH_ACCESS_TOKEN→INFISICAL_TOKEN→TOKEN(the--tokenflag is per-command and not accessible from the telemetry layer; env vars cover the dominant CI use case).st.prefix — deprecated, no JWT payload).distinctId, never for authorization, and the same token is signature-verified on the backend when the API call is made).identityIdclaim.GetDistinctId()is then updated to a three-tier resolution:identity-<identityId>(new)anonymous_cli_<machineId>Logged-in user wins over env-token identity (updated after review): some commands (
infisical user switch, the local-config branch ofinfisical login) never authenticate against the backend, and the interactiveinfisical loginflow has just enriched the email person withIdentify/Alias. Attributing those events to a staleidentity-<id>from an env var would split person-level analytics. The env-token branch only fires whenLoggedInUserEmail == "", which is the dominant CI / container / K8s case the PR is targeting and the only state where the CLI has no other actor to attribute telemetry to.This is upgrade-gated: existing
anonymous_cli_*person records are not retroactively reassigned, but new events from upgraded CLIs route toidentity-<id>and stop generating new anonymous persons. Combined with #196 (lazyIdentifyUserfor users who logged in pre-v0.43.59), this closes the two largest sources of CLI person-attribution gaps in PostHog.Type ✨
Tests 🛠️
Manual reasoning-through of the resolution table:
LoggedInUserEmaila@x.comINFISICAL_TOKENset to UA JWT (identityId=abc)a@x.com(logged-in user wins, prevents misattribution foruser switch/login)a@x.comINFISICAL_TOKENset to service token (st.…)a@x.com(unchanged)a@x.comINFISICAL_TOKENset to malformed valuea@x.com(unchanged)a@x.coma@x.com(unchanged)""INFISICAL_TOKENset to UA JWT (identityId=abc)identity-abc(new behavior, the dominant CI case)""INFISICAL_UNIVERSAL_AUTH_ACCESS_TOKENset,INFISICAL_TOKENalso setutil.GetInfisicalTokenprecedence)""INFISICAL_TOKENset to service token (st.…)anonymous_cli_<machineId>(service tokens skip the new branch)""anonymous_cli_<machineId>(unchanged)Build verified locally:
go build ./... # Pre-existing run.go vet warnings are unrelated and present on main.No new dependencies are introduced — JWT payload extraction uses only
encoding/base64,encoding/json, andstringsfrom the Go standard library, and the JWT signature is intentionally not verified (this is for telemetry attribution, not auth).Link to Devin session: https://app.devin.ai/sessions/6363c6181d1641f8a564bc8161e4270f
Requested by: @0xArshdeep