feat(tracking): add auth event tracking#987
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMigrates auth logging to internal/tracking, implements structured auth failure reasons/timestamps, instruments root and hint flows to emit auth/security telemetry, wires tracking runtime/config and user-unique ID, and adds tests plus small VFS helpers. ChangesAuth Telemetry Migration and Tracking Module
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI/Auth code
participant Local as Local Auth Log
participant Remote as Remote Telemetry
CLI->>Local: tracking.LogAuthResponse / tracking.LogAuthError
Local->>Local: append to daily-rotated file
CLI->>Remote: emitRemoteAuthEvent (if endpoint enabled)
Remote->>Remote: POST JSON payload (timeout)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
internal/cmdutil/factory_default_test.go (1)
20-20: ⚡ Quick winUse stdlib filesystem APIs for test fixture setup in
_test.go.Please switch these fixture writes/dir-creation calls from
vfs.*toos.*to keep tests decoupled from production FS abstractions.Suggested patch
import ( "context" "errors" + "os" "path/filepath" "testing" @@ - "github.com/larksuite/cli/internal/vfs" "github.com/larksuite/cli/internal/vfs/localfileio" ) @@ - if err := vfs.WriteFile(filepath.Join(dir, "config.json"), raw, 0600); err != nil { + if err := os.WriteFile(filepath.Join(dir, "config.json"), raw, 0o600); err != nil { t.Fatalf("WriteFile(config.json) error = %v", err) } @@ - if err := vfs.MkdirAll(filepath.Join(core.GetConfigDir(), "locks"), 0700); err != nil { + if err := os.MkdirAll(filepath.Join(core.GetConfigDir(), "locks"), 0o700); err != nil { t.Fatalf("MkdirAll(locks) error = %v", err) }Based on learnings: In larksuite/cli Go tests (
*_test.go), use standard library filesystem APIs (os.Mkdir,os.Create,os.CreateTemp,os.WriteFile) for fixture setup and avoidvfs.*there.Also applies to: 285-286, 329-330
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cmdutil/factory_default_test.go` at line 20, Tests are using the production VFS helpers (imported as vfs and calls like vfs.MkdirAll / vfs.WriteFile / vfs.CreateTemp) to create fixtures; switch those to the stdlib: remove the vfs import and replace vfs.MkdirAll with os.MkdirAll, vfs.WriteFile/vfs.Create with os.WriteFile or os.Create (and write to the file), and vfs.CreateTemp with os.CreateTemp (or os.MkdirTemp as appropriate); ensure you import "os" (and "path/filepath" only if needed) in the _test.go so tests use standard filesystem APIs for fixture setup.cmd/root.go (1)
362-375: 💤 Low valueExtract shared code-string resolution to avoid duplication.
The
codeStrswitch logic (lines 364-372) is identical towriteSecurityPolicyError(lines 260-267). Consider extracting a helper likesecurityPolicyCodeString(code int) stringto keep them in sync.♻️ Proposed refactor
+// securityPolicyCodeString maps a security policy error code to its string representation. +func securityPolicyCodeString(code int) string { + switch code { + case internalauth.LarkErrBlockByPolicyTryAuth: + return "challenge_required" + case internalauth.LarkErrBlockByPolicy: + return "access_denied" + default: + return strconv.Itoa(code) + } +} + // logSecurityPolicyError logs a security policy error using tracking.LogAuthError. func logSecurityPolicyError(spErr *internalauth.SecurityPolicyError) { - var codeStr string - switch spErr.Code { - case internalauth.LarkErrBlockByPolicyTryAuth: - codeStr = "challenge_required" - case internalauth.LarkErrBlockByPolicy: - codeStr = "access_denied" - default: - codeStr = strconv.Itoa(spErr.Code) - } + codeStr := securityPolicyCodeString(spErr.Code) errMsg := fmt.Sprintf("reason=security_policy code=%s message=%q", codeStr, spErr.Message) tracking.LogAuthError("auth", "security_policy", fmt.Errorf(errMsg)) }Then update
writeSecurityPolicyErrorto use the same helper.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/root.go` around lines 362 - 375, Extract the duplicated switch that maps security policy numeric codes to strings into a single helper function (e.g., securityPolicyCodeString(code int) string) and replace the switch in both logSecurityPolicyError and writeSecurityPolicyError with calls to that helper; ensure the helper returns "challenge_required" for internalauth.LarkErrBlockByPolicyTryAuth, "access_denied" for internalauth.LarkErrBlockByPolicy and otherwise returns strconv.Itoa(code), and update both functions to use securityPolicyCodeString(spErr.Code) (or the appropriate code value) to build their messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/error_auth_hint.go`:
- Around line 82-102: The function extractNeedAuthorizationError is unused;
remove its entire definition from cmd/error_auth_hint.go (delete the
extractNeedAuthorizationError(...) block) and then clean up any now-unused
imports (e.g., errors or internalauth if they were only used by this function)
so the package compiles cleanly; ensure no other code references
extractNeedAuthorizationError before deleting.
In `@internal/auth/uat_client.go`:
- Around line 86-88: The code currently maps a nil refreshed result to
ReasonRefreshFailed; instead check whether the stored refresh token is expired
and return ReasonRefreshExpired in that case. Concretely, in the branch where
refreshed == nil inspect the stored token expiry field (e.g.,
stored.RefreshExpiry, stored.RefreshExpiresAt or stored.RefreshTokenExpiry—use
the actual field on the stored object) and if that timestamp is before
time.Now() return &NeedAuthorizationError{UserOpenId: opts.UserOpenId, Reason:
ReasonRefreshExpired, GrantedAt: stored.GrantedAt}; otherwise keep returning
ReasonRefreshFailed. Update the branch around refreshed == nil in uat_client.go
so telemetry differentiates expired refresh tokens from other refresh failures.
In `@internal/tracking/event.go`:
- Around line 194-203: The getParentProcessNameLinux function currently calls
os.Readlink and os.ReadFile directly; replace those calls with the repository
filesystem abstraction (use vfs.Readlink for "/proc/%d/exe" and vfs.ReadFile for
"/proc/%d/comm") so tests can mock FS access, preserving the same error handling
and returning filepath.Base(targetPath) and strings.TrimSpace(string(data)) as
before; update any other similar probes in this file to use vfs.* as well.
---
Nitpick comments:
In `@cmd/root.go`:
- Around line 362-375: Extract the duplicated switch that maps security policy
numeric codes to strings into a single helper function (e.g.,
securityPolicyCodeString(code int) string) and replace the switch in both
logSecurityPolicyError and writeSecurityPolicyError with calls to that helper;
ensure the helper returns "challenge_required" for
internalauth.LarkErrBlockByPolicyTryAuth, "access_denied" for
internalauth.LarkErrBlockByPolicy and otherwise returns strconv.Itoa(code), and
update both functions to use securityPolicyCodeString(spErr.Code) (or the
appropriate code value) to build their messages.
In `@internal/cmdutil/factory_default_test.go`:
- Line 20: Tests are using the production VFS helpers (imported as vfs and calls
like vfs.MkdirAll / vfs.WriteFile / vfs.CreateTemp) to create fixtures; switch
those to the stdlib: remove the vfs import and replace vfs.MkdirAll with
os.MkdirAll, vfs.WriteFile/vfs.Create with os.WriteFile or os.Create (and write
to the file), and vfs.CreateTemp with os.CreateTemp (or os.MkdirTemp as
appropriate); ensure you import "os" (and "path/filepath" only if needed) in the
_test.go so tests use standard filesystem APIs for fixture setup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 964df8f1-abea-45b2-b5d8-d5c9f2faf263
📒 Files selected for processing (15)
cmd/error_auth_hint.gocmd/root.gointernal/auth/auth_response_log.gointernal/auth/device_flow_test.gointernal/auth/errors.gointernal/auth/uat_client.gointernal/auth/verify_test.gointernal/cmdutil/factory_default.gointernal/cmdutil/factory_default_test.gointernal/core/config.gointernal/keychain/auth_log.gointernal/keychain/auth_log_test.gointernal/keychain/keychain.gointernal/tracking/event.gointernal/tracking/event_test.go
💤 Files with no reviewable changes (2)
- internal/keychain/auth_log_test.go
- internal/keychain/auth_log.go
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (43.06%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #987 +/- ##
==========================================
+ Coverage 66.78% 67.69% +0.91%
==========================================
Files 564 591 +27
Lines 52441 55457 +3016
==========================================
+ Hits 35024 37543 +2519
- Misses 14516 14784 +268
- Partials 2901 3130 +229 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@3dfd8767e437043048dfb13fb7e2aabe551003bc🧩 Skill updatenpx skills add larksuite/cli#feat/auth_log_report -y -g |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/tracking/event.go (1)
218-222: 💤 Low valueDead code: macOS doesn't have
/procfilesystemThe
getParentProcessNameDarwinfunction attempts to read/proc/%d/exeviavfs.Readlink, but macOS does not have a/procfilesystem. This call will always fail and fall through to thepscommand. Consider removing this unreachable branch to avoid confusion.♻️ Suggested simplification
func getParentProcessNameDarwin(ppid int) string { - exePath := fmt.Sprintf("/proc/%d/exe", ppid) - if targetPath, err := vfs.Readlink(exePath); err == nil { - return filepath.Base(targetPath) - } - cmd := exec.Command("ps", "-p", fmt.Sprintf("%d", ppid), "-o", "comm=") var out bytes.Buffer cmd.Stdout = &out🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/tracking/event.go` around lines 218 - 222, getParentProcessNameDarwin contains an unreachable branch that tries to read "/proc/%d/exe" via vfs.Readlink (which macOS lacks); remove that branch and simplify the function to directly use the existing ps-based fallback logic (keep the ps invocation code path and any trimming/return logic) so there is no attempt to read /proc using vfs.Readlink in getParentProcessNameDarwin.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/tracking/event.go`:
- Around line 218-222: getParentProcessNameDarwin contains an unreachable branch
that tries to read "/proc/%d/exe" via vfs.Readlink (which macOS lacks); remove
that branch and simplify the function to directly use the existing ps-based
fallback logic (keep the ps invocation code path and any trimming/return logic)
so there is no attempt to read /proc using vfs.Readlink in
getParentProcessNameDarwin.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2fe83d62-1c2a-407e-aded-f3aa827ad555
📒 Files selected for processing (9)
cmd/error_auth_hint.gocmd/root.gointernal/auth/uat_client.gointernal/cmdutil/factory_default.gointernal/cmdutil/factory_default_test.gointernal/tracking/event.gointernal/vfs/default.gointernal/vfs/fs.gointernal/vfs/osfs.go
💤 Files with no reviewable changes (1)
- cmd/error_auth_hint.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/tracking/event.go (1)
222-236: 💤 Low valueDead code path:
/procdoes not exist on macOSThe
getParentProcessNameDarwinfunction attempts to read/proc/%d/exe(Lines 223-226) before falling back tops. macOS does not have a/procfilesystem, sovfs.Readlinkwill always fail here, making this block unreachable dead code. The fallback topsworks correctly, but this adds unnecessary overhead.🧹 Remove dead code path
func getParentProcessNameDarwin(ppid int) string { - exePath := fmt.Sprintf("/proc/%d/exe", ppid) - if targetPath, err := vfs.Readlink(exePath); err == nil { - return filepath.Base(targetPath) - } - cmd := exec.Command("ps", "-p", fmt.Sprintf("%d", ppid), "-o", "comm=") var out bytes.Buffer cmd.Stdout = &out if err := cmd.Run(); err == nil { return strings.TrimSpace(out.String()) } return fmt.Sprintf("ppid=%d", ppid) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/tracking/event.go` around lines 222 - 236, Remove the dead /proc-based branch in getParentProcessNameDarwin: the call to vfs.Readlink(fmt.Sprintf("/proc/%d/exe", ppid)) is unreachable on macOS, so delete that block and directly use the existing ps fallback (exec.Command("ps", "-p", fmt.Sprintf("%d", ppid), "-o", "comm="") with its bytes.Buffer stdout handling) to obtain the parent process name; ensure the final fallback still returns fmt.Sprintf("ppid=%d", ppid) if the ps command fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/tracking/event.go`:
- Around line 222-236: Remove the dead /proc-based branch in
getParentProcessNameDarwin: the call to vfs.Readlink(fmt.Sprintf("/proc/%d/exe",
ppid)) is unreachable on macOS, so delete that block and directly use the
existing ps fallback (exec.Command("ps", "-p", fmt.Sprintf("%d", ppid), "-o",
"comm="") with its bytes.Buffer stdout handling) to obtain the parent process
name; ensure the final fallback still returns fmt.Sprintf("ppid=%d", ppid) if
the ps command fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 904575b3-e0ab-4b60-9bc0-447bbbd82466
📒 Files selected for processing (3)
internal/cmdutil/factory_default.gointernal/tracking/event.gointernal/tracking/event_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/tracking/event.go (1)
548-553:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail closed for unknown/empty brand when resolving telemetry endpoint.
ResolveTelemetryEndpointcurrently sends to Feishu for any non-"lark"value, including"". That makes misconfiguration/default-state route auth telemetry externally instead of disabling it. Prefer explicit allowlist ("feishu"only) and return empty for unknown/empty brands.Proposed fix
func ResolveTelemetryEndpoint(brand string) string { - if brand == "lark" { - return "" - } - return telemetryEndpointFeishu + switch strings.ToLower(strings.TrimSpace(brand)) { + case "feishu": + return telemetryEndpointFeishu + case "lark": + return "" + default: + return "" + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/tracking/event.go` around lines 548 - 553, ResolveTelemetryEndpoint currently treats any non-"lark" brand (including empty) as Feishu; change it to an explicit allowlist so only brand == "feishu" returns telemetryEndpointFeishu and all other/unknown/empty brands return an empty string. Update the ResolveTelemetryEndpoint function to check for "feishu" (not just not-"lark") and return telemetryEndpointFeishu only in that case, otherwise return "" to ensure unknown or empty brands fail closed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/cmdutil/factory_default.go`:
- Around line 82-86: The NewDefault constructor must not call f.Config()
eagerly; remove the f.Config() invocation and the unconditional assignment to
tracking.AuthLogBrand and tracking.AuthLogAppID so construction doesn't force
credential/config resolution or swallow errors. Instead, set those tracking
globals only when a Config is successfully obtained elsewhere (e.g., after
explicit f.Config() calls or in the place that resolves configuration), or add a
dedicated helper like SetTrackingFromConfig(cfg) that callers invoke after
successful config resolution; ensure references to f.Config(), NewDefault,
tracking.AuthLogBrand and tracking.AuthLogAppID are updated accordingly.
---
Outside diff comments:
In `@internal/tracking/event.go`:
- Around line 548-553: ResolveTelemetryEndpoint currently treats any non-"lark"
brand (including empty) as Feishu; change it to an explicit allowlist so only
brand == "feishu" returns telemetryEndpointFeishu and all other/unknown/empty
brands return an empty string. Update the ResolveTelemetryEndpoint function to
check for "feishu" (not just not-"lark") and return telemetryEndpointFeishu only
in that case, otherwise return "" to ensure unknown or empty brands fail closed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: de4c6d47-50ef-4d18-af6d-9a02075d9fbd
📒 Files selected for processing (4)
internal/cmdutil/factory_default.gointernal/cmdutil/factory_default_test.gointernal/tracking/event.gointernal/tracking/event_test.go
💤 Files with no reviewable changes (1)
- internal/cmdutil/factory_default_test.go
684c2e0 to
3dfd876
Compare
Summary
Add auth event tracking system that records authentication events (responses and errors) to local log files and reports them to a remote telemetry endpoint. This enables observability into auth failures and helps diagnose permission/token issues across CLI sessions.
Changes
internal/trackingpackage with event emission pipeline:LogAuthResponseandLogAuthErroras public entry pointsauth-YYYY-MM-DD.log) under the workspace-aware config directory, with automatic 7-day cleanuphttps://mcs-bd.feishu.cn/v1/listfor feishu, disabled for lark), with 3s timeout and fail-open behaviorResolveTelemetryEndpoint(brand)to resolve telemetry endpoint by brand, keeping tracking-specific config in thetrackingpackageRuntimeDirFunc,AuthLogUserUniqueIDProvider, andAuthLogRemoteEndpointProvidervia function variables to avoidcore ↔ trackingimport cyclewrapError) and auth error hint enrichment (enrichMissingScopeError,logAuthFailureReason,logSecurityPolicyError)logHTTPResponseandlogSDKResponsein the auth packageTest Plan
go test -race ./internal/tracking/... ./internal/keychain/... ./internal/auth/... ./internal/cmdutil/... ./cmd/...)go vet ./...passesgofmt -l .produces no outputRelated Issues
Summary by CodeRabbit
New Features
Improvements
Refactor
Tests