Skip to content

feat(codespaces): add audit logging for security and observability#351

Merged
PureWeen merged 4 commits intoPureWeen:mainfrom
btessiau:feat/codespace-audit-logging
Mar 11, 2026
Merged

feat(codespaces): add audit logging for security and observability#351
PureWeen merged 4 commits intoPureWeen:mainfrom
btessiau:feat/codespace-audit-logging

Conversation

@btessiau
Copy link
Contributor

Summary

Adds structured audit logging for PolyPilot's Codespaces integration, providing security visibility and troubleshooting capability for SSH connections, DevTunnel operations, and headless Copilot processes.

Closes #350

What's New

AuditLogService — Thread-safe JSONL audit writer

  • 10 typed event methods covering the full codespace connection lifecycle
  • Daily file rotation (audit_YYYY-MM-DD.jsonl) in ~/.polypilot/audit_logs/
  • 30-day retention with auto-purge
  • Thread-safe writes via SemaphoreSlim(1,1)
  • Never crashes — all errors swallowed to Console.Error

Audit Events

Event Integration Point
CODESPACE_CONNECTION_INITIATED CodespaceService.OpenTunnelAsync
CODESPACE_SSH_HANDSHAKE_SUCCESS CodespaceService.OpenTunnelAsync / OpenSshTunnelAsync
CODESPACE_SSH_HANDSHAKE_FAILURE CodespaceService.OpenTunnelAsync (catch)
COPILOT_HEADLESS_START CodespaceService.StartCopilotHeadlessAsync
COPILOT_HEADLESS_FAILURE CodespaceService.StartCopilotHeadlessAsync (catch)
DEVTUNNEL_TOKEN_ACQUIRED DevTunnelService.IssueAccessTokenAsync
DEVTUNNEL_CONNECTION_ESTABLISHED DevTunnelService.HostAsync
DEVTUNNEL_CONNECTION_FAILED DevTunnelService.HostAsync (catch)
SESSION_CLOSED DevTunnelService.Stop
SESSION_ERROR Available for integration

Security

  • Tokens/keys/passwords never loggedSanitizeSecret() truncates to 8 chars + [redacted]
  • Error sanitization — Home paths, ghp_*, gho_*, github_pat_*, JWT patterns all redacted
  • No PII in logs

Files Changed

File Change
PolyPilot/Models/AuditLogEntry.cs NEW — Data model + event type constants
PolyPilot/Services/AuditLogService.cs NEW — Core service (10 Log methods, sanitization, JSONL writer)
PolyPilot.Tests/AuditLogTests.cs NEW — 19 tests
PolyPilot/Services/CodespaceService.cs Added AuditLog property + constructor
PolyPilot/Services/CodespaceService.Lifecycle.cs Headless start/failure audit
PolyPilot/Services/DevTunnelService.cs Token/tunnel/session lifecycle audit
PolyPilot/MauiProgram.cs Register AuditLogService singleton

Testing (19 tests, all pass)

  • File creation & daily rotation
  • All 10 event types with field verification
  • Sensitive data sanitization (4 token patterns + home path)
  • Thread safety (10 concurrent tasks × 20 entries = 200, zero data loss)
  • Error handling (invalid paths don't crash)
  • 30-day retention purge

Ready for code review and feedback. This work is contributed as a personal open-source contribution.

Add structured audit logging for Codespaces integration with 10 event
types covering SSH connections, DevTunnel lifecycle, headless Copilot
processes, and session errors.

New files:
- AuditLogEntry.cs: Data model + event type constants
- AuditLogService.cs: Thread-safe JSONL writer with sanitization,
  daily rotation, and 30-day retention
- AuditLogTests.cs: 19 tests (file I/O, typed events, sanitization,
  thread safety, error handling, retention)

Integration points:
- CodespaceService: SSH tunnel/port-forward lifecycle audit
- CodespaceService.Lifecycle: Headless copilot start/failure
- DevTunnelService: Token acquisition, tunnel hosting, session close

Security: Tokens/keys/passwords never logged. Home paths, ghp_*,
gho_*, github_pat_*, and JWT patterns redacted from error messages.

Closes PureWeen#350

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dinisdopion-sys
Copy link

dinisdopion-sys commented Mar 11, 2026 via email

Copy link
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 PR #351 Review -- feat(codespaces): audit logging for security and observability

CI Status: ⚠️ No checks configured
Prior reviews: None

Good addition overall — structured JSONL, SemaphoreSlim(1,1) for thread-safe writes, sanitization for tokens and home paths, and solid test coverage. Two correctness issues found.


🟡 MODERATE — PurgeOldLogs uses File.GetCreationTimeUtc instead of parsing the filename date

AuditLogService.cs:112

if (File.GetCreationTimeUtc(file) < cutoff)
    File.Delete(file);

On Linux (GTK target) and some macOS configurations, filesystems don't store a birth time — GetCreationTimeUtc falls back to the last modification time. Since WriteEntryAsync appends to the file on every audit event, the effective mtime of any active log file is always "now." The result: on Linux, PurgeOldLogs never deletes anything, silently breaking the 30-day retention policy.

Fix: Parse the date directly from the filename (the format is already audit_YYYY-MM-DD.jsonl):

var name = Path.GetFileNameWithoutExtension(file); // "audit_2024-01-01"
if (name.Length == "audit_YYYY-MM-DD".Length
    && DateTime.TryParseExact(name[6..], "yyyy-MM-dd", null,
        System.Globalization.DateTimeStyles.None, out var fileDate)
    && fileDate < cutoff.Date)
{
    File.Delete(file);
}

🟡 MODERATE — Stop() always logs cleanClose: true, but is called from failure paths too

DevTunnelService.cs:460

public void Stop()
{
    SetState(TunnelState.Stopping);
    _ = _auditLog?.LogSessionClosed(null, 0, cleanClose: true, ...); // ← always true

Stop() is called from three sites:

  1. Normal user-initiated stop (clean ✓)
  2. catch (Exception ex) in HostAsync (failure ✗) — also fires LogDevtunnelConnectionFailed immediately after
  3. Dispose() (clean ✓)

On the failure path (case 2), the audit log gets both SESSION_CLOSED (cleanClose=true) and DEVTUNNEL_CONNECTION_FAILED — the first event is factually wrong and creates confusing audit trails.

Fix: Add a cleanClose parameter to Stop():

public void Stop(bool cleanClose = true)
{
    SetState(TunnelState.Stopping);
    _ = _auditLog?.LogSessionClosed(null, 0, cleanClose, "DevTunnel stopped");

Then pass Stop(cleanClose: false) from the catch block in HostAsync.


🟢 MINOR — COPILOT_HEADLESS_FAILURE fires even when the result is indeterminate (not a clear failure)

CodespaceService.Lifecycle.cs:197-199

_ = AuditLog?.LogCopilotHeadlessFailure(codespaceName, null, result?.Trim() ?? "Unknown failure");
return true; // Still return true (SSH worked) — let the tunnel probe determine if it's listening

The function returns true here — SSH worked fine, and copilot's status is uncertain. The comment explicitly says "let the tunnel probe determine." Logging COPILOT_HEADLESS_FAILURE for this case creates false positives in the audit trail for every successful codespace startup where ss reports anything other than exactly STARTED.

Suggested fix: Use a distinct event (e.g., COPILOT_HEADLESS_UNCERTAIN) or include a "determined_by": "tunnel_probe" field in the existing failure event to distinguish it from a confirmed failure.


🟢 MINOR — Fire-and-forget tasks discarded with _ = at shutdown

All audit call sites (CodespaceService.cs:70,108,124,288, Lifecycle.cs:192,197, DevTunnelService.cs:268,274,443,460)

All calls use _ = AuditLog?.LogXxx(). The LogXxx methods return Task (from WriteEntryAsync), which is abandoned. If AuditLogService.Dispose() runs before an in-flight task reaches _writeLock.WaitAsync(), the SemaphoreSlim is disposed and the task's WaitAsync() throws ObjectDisposedException — caught by the outer try/catch in WriteEntryAsync, so no crash, but the final audit entries at shutdown are silently dropped.

Low practical risk (singleton lifetime), but worth noting. Using SafeFireAndForget (the project's established pattern in CopilotService.Utilities.cs) or simply _ = WriteEntryAsync(...).ConfigureAwait(false) would be consistent.


🟢 MINOR — LogDevtunnelConnectionEstablished always records latencyMs: 0

DevTunnelService.cs:268

latencyMs is hardcoded to 0. The field in the audit entry is always misleading. Either measure actual latency (start a Stopwatch before HostAsync begins the tunnel setup) or remove the field.


Test Coverage

19 tests, well-structured. One gap: no test for PurgeOldLogs with File.GetLastWriteTimeUtc vs GetCreationTimeUtc on Linux — but this is hard to test without mocking the filesystem. The PurgeOldLogs_DeletesOldFiles test uses File.SetCreationTimeUtc, which won't reveal the Linux regression.


Recommendation: ⚠️ Request changes

Two actionable fixes before merge:

  1. Replace GetCreationTimeUtc with filename date parsing in PurgeOldLogs
  2. Pass cleanClose flag through to Stop()

The COPILOT_HEADLESS_FAILURE false-positive and latency hardcode are minor but easy to fix in the same pass.

…erminate event, latency measurement

Review: PureWeen#351 (review)

1. PurgeOldLogs: parse date from filename instead of GetCreationTimeUtc
   (unreliable on Linux where it falls back to mtime)
2. Stop(cleanClose): add parameter, pass false from failure paths so
   audit trail distinguishes clean vs error shutdowns
3. Add COPILOT_HEADLESS_INDETERMINATE event for cases where SSH worked
   but startup probe was inconclusive (avoids false-positive failures)
4. Measure actual tunnel setup latency via Stopwatch instead of
   hardcoded 0
5. Add 2 new tests: unparsable filename retention, indeterminate event

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! All 5 items addressed in 55801ce:

🟡 MODERATE — PurgeOldLogs GetCreationTimeUtc → Fixed. Now parses date from filename (audit_YYYY-MM-DD.jsonl) instead of relying on filesystem timestamps. Added a test for unparsable filenames (kept, not deleted).

**🟡

🟢 MINOR — COPILOT_HEADLESS_FAILURE false positive → Fixed. Added COPILOT_HEADLESS_INDETERMINATE event type with "determined_by": "tunnel_probe" field. The return true path now logs this instead of a failure event. Actual failures still use COPILOT_HEADLESS_FAILURE.

🟢 MINOR — latencyMs hardcoded to 0 → Fixed. Added Stopwatch in HostAsync that starts at SetState(Starting) and stops at SetState(Running), capturing real tunnel setup time.

🟢 MINOR — Fire-and-forget _ = → Acknowledged. Low risk since AuditLogService is a singleton with app lifetime. The outer try/catch in WriteEntryAsync swallows ObjectDisposedException if it ever occurs. Can switch to SafeFireAndForget in a follow-up if preferred.

21 tests pass (19 original + 2 new). Build clean on Mac Catalyst.

Copy link
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔄 PR #351 Re-Review (Round 2) — Audit Logging

CI Status: ⚠️ No checks configured
Commits reviewed: c838b7c (original) + 55801ce (fix commit)


Previous Findings Status

# Severity Finding Status
1 🟡 MODERATE PurgeOldLogs() is dead code — never called from production code ⚠️ STILL PRESENT
2 🟡 MODERATE PurgeOldLogs used File.GetCreationTimeUtc (unreliable on Linux) FIXED — now parses date from filename
3 🟡 MODERATE TestSetup.cs missing AuditLogService.ResetLogDir() N/AAuditLogTests use instance-level override (new AuditLogService(testDir)), never the static path. No production test leaks to ~/.polypilot/audit_logs/.
4 🟡 MODERATE Stop() hardcoded cleanClose: true in failure paths FIXEDStop(bool cleanClose = true) parameter added; failure paths pass false
5 🟢 MINOR latencyMs: 0 hardcoded in LogDevtunnelConnectionEstablished FIXEDStopwatch measures actual setup time
6 🟢 MINOR Fire-and-forget _ = LogXxx() instead of SafeFireAndForget() ⚠️ STILL PRESENT — all audit call sites use _ = ...

Finding Detail: Finding 1 — PurgeOldLogs() is dead code

PurgeOldLogs() exists at AuditLogService.cs:100 and is described as "Called once on startup" in its comment, but it is never actually called from MauiProgram.cs, CopilotService, or any other production code path. The PR description states "30-day retention with auto-purge" but this is incorrect — audit logs accumulate indefinitely. The fix is simple: call auditLog.PurgeOldLogs() once after DI registration in MauiProgram.cs, or from InitializeAsync.

Finding Detail: Finding 6 — _ = LogXxx() fire-and-forget pattern

Per codebase convention (from memory: "All fire-and-forget calls in CopilotService use SafeFireAndForget() wrapper"), audit call sites in CodespaceService.cs and DevTunnelService.cs all use bare _ = AuditLog?.LogXxx(...). Since WriteEntryAsync already swallows all exceptions internally, the practical impact is low (the tasks will never fault). The concern is consistency — at shutdown, these abandoned Tasks may not complete. This is acceptable given the "never crashes" design contract, but worth noting for consistency with the codebase pattern.


New Findings

None. The fix commit is clean. The COPILOT_HEADLESS_INDETERMINATE event is a good addition — it correctly distinguishes "SSH worked but startup inconclusive" from a genuine failure.


Test Coverage

21 tests now (19 original + 2 new for indeterminate event and unparsable filename retention). The PurgeOldLogs_DeletesOldFiles test now correctly uses filename-based date comparison without relying on File.SetCreationTimeUtc.


Verdict: ⚠️ Request changes (one specific ask)

Required before merge:

Add a PurgeOldLogs() call in production code (e.g., MauiProgram.cs after registering AuditLogService, or in an InitializeAsync method). Without this, the "30-day retention" feature documented in the PR description is never active and logs grow indefinitely.

All other findings from Round 1 are resolved. Once PurgeOldLogs is wired up, this PR is ready to merge.

PurgeOldLogs() was defined but never called from production code,
so audit logs would accumulate indefinitely. Now called eagerly
at startup in MauiProgram.cs right after AuditLogService registration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in c433a9d:

🟡 MODERATE — PurgeOldLogs dead code → Fixed. Now called eagerly at startup in MauiProgram.cs right after AuditLogService registration. Uses a throwaway instance since the singleton hasn't been built yet — PurgeOldLogs() only touches the filesystem (no instance state needed).

🟢 MINOR — SafeFireAndForget consistency → Acknowledged. SafeFireAndForget is internal static on CopilotService, so it can't be called from CodespaceService/DevTunnelService without exposing it. Since WriteEntryAsync already has a full try/catch that swallows all exceptions, the practical difference is zero. Happy to extract it to a shared utility if preferred.

Build clean, 21/21 tests pass.

Copy link
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔄 PR #351 Re-Review (Round 3) — Audit Logging

CI Status: ⚠️ No checks configured
Commits reviewed: c838b7c (original) → 55801ce (Round 2 fixes) → c433a9d (Round 3 fixes)


Previous Findings Status

# Finding Round 3 Status
1 🟡 PurgeOldLogs() dead code — never called in production FIXED
2 �� PurgeOldLogs used GetCreationTimeUtc (unreliable on Linux) ✅ FIXED (Round 2)
3 🟢 TestSetup.cs missing AuditLogService.SetLogDirForTesting() ❌ Still present (non-blocking)
4 🟡 Stop() hardcoded cleanClose: true ✅ FIXED (Round 2)
5 🟢 latencyMs: 0 hardcoded ✅ FIXED (Round 2)
6 🟢 _ = LogXxx() bare discards instead of SafeFireAndForget() ❌ Still present (non-blocking)

Finding #1 (the main blocker) is now fixed: MauiProgram.cs:106 calls new AuditLogService().PurgeOldLogs() in a best-effort try/catch at startup, immediately after the singleton registration. The static _auditLogDir is shared between this throwaway instance and the DI singleton, so both resolve the same directory. ✅


Remaining Non-Blocking Items

🟢 MINOR — AuditLogJsonContext still unused (PolyPilot/Models/AuditLogEntry.cs:24-30)
ToJsonLine() allocates new JsonSerializerOptions { ... } on every call. AuditLogJsonContext (defined 27 lines below with matching [JsonSourceGenerationOptions]) is dead code. For AOT/trimmer correctness on iOS/Android, this should use the source-generated path:

public string ToJsonLine()
    => JsonSerializer.Serialize(this, AuditLogJsonContext.Default.AuditLogEntry);

Non-blocking since the manual path works, but the source-generated context suggests this was the intended approach.

🟢 MINOR — TestSetup.cs missing audit log isolation (PolyPilot.Tests/TestSetup.cs:27)
AuditLogService._auditLogDir is a new static field that isn't redirected in the [ModuleInitializer]. The codebase convention (documented in TestSetup.cs:14-17) requires all new static file paths to be added here. Safe today because AuditLogTests passes a logDir constructor arg, but a future test that calls new AuditLogService() without a dir arg will write to the real ~/.polypilot/audit_logs/.

Fix: add AuditLogService.SetLogDirForTesting(TestBaseDir); to TestSetup.Initialize().

🟢 MINOR — _ = LogXxx() bare discards (9 call sites across CodespaceService.cs, CodespaceService.Lifecycle.cs, DevTunnelService.cs)
Exceptions are fully swallowed inside WriteEntryAsync, so no crash risk. But the codebase convention is SafeFireAndForget() for fire-and-forget tasks — see CopilotService.Utilities.cs:293. The bare discards are functionally equivalent here given the internal exception guard.


Verdict

Approve — the main blocking issue (PurgeOldLogs dead code) is resolved. The three remaining items are all 🟢 MINOR, non-blocking, and present no correctness or data-loss risk. The PR is still in DRAFT; when marked ready, the two easiest mechanical fixes (ToJsonLine → source-gen context, SetLogDirForTesting in TestSetup) would be good to apply before merge but are not blockers.

1. ToJsonLine() now uses AuditLogJsonContext (source-generated) instead
   of allocating JsonSerializerOptions per call. Registered int/long/
   double/bool/string/Dictionary types for polymorphic Details values.

2. TestSetup.cs: added AuditLogService.SetLogDirForTesting() to the
   ModuleInitializer so future tests that construct AuditLogService
   without an explicit dir arg won't write to ~/.polypilot/audit_logs/.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three remaining minors addressed in d59f3d8:

  1. ToJsonLine() → source-gen context — Now uses AuditLogJsonContext.Default.AuditLogEntry. Registered int, long, double, bool, string, and Dictionary<string, object?> for polymorphic Details values. Zero per-call allocations.

  2. TestSetup.cs audit log isolation — Added AuditLogService.SetLogDirForTesting() to [ModuleInitializer], matching the codebase convention for all static file paths.

  3. _ = LogXxx() — Left as-is per discussion (SafeFireAndForget is internal to CopilotService; WriteEntryAsync already has full exception guard).

21/21 tests pass, build clean. Ready to mark non-draft when convenient.

Copy link
Contributor Author

@btessiau btessiau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔄 PR #351 Re-Review (Round 4) — Audit Logging

CI Status: ⚠️ No checks configured
Commits reviewed: c838b7c (original) + 55801ce + c433a9d + d59f3d8 (latest)
New since Round 3: one new commit d59f3d8 ("fix: use source-gen JSON context + add audit log test isolation")


Previous Findings — All Resolved ✅

# Finding Status
1 🟡 PurgeOldLogs() dead code — never called at startup ✅ FIXED (c433a9dMauiProgram.cs calls it at registration time)
2 🟡 File.GetCreationTimeUtc unreliable on Linux ✅ FIXED (55801ce — filename-based date parsing audit_YYYY-MM-DD)
3 🟡 TestSetup.cs missing AuditLogService.SetLogDirForTesting() ✅ FIXED (d59f3d8 — added to ModuleInitializer)
4 🟡 Stop() hardcodes cleanClose: true for failure paths ✅ FIXED (55801ceStop(cleanClose: bool) parameter added)
5 🟢 latencyMs: 0 hardcoded in LogDevtunnelConnectionEstablished ✅ FIXED (55801ce — Stopwatch measures actual setup latency)
6 🟢 _ = AuditLog?.Log... fire-and-forget pattern 🟢 Non-blocking — SafeFireAndForget doesn't exist in this codebase; pattern is consistent with other fire-and-forget calls
7 🟢 ToJsonLine() allocates JsonSerializerOptions per call ✅ FIXED (d59f3d8 — uses AuditLogJsonContext.Default.AuditLogEntry)

What Changed in d59f3d8

  1. AuditLogEntry.ToJsonLine() — Replaced new JsonSerializerOptions { ... } with the source-generated AuditLogJsonContext.Default.AuditLogEntry. The context now registers int, long, double, bool, string, and Dictionary<string, object?> so all Details value types serialize correctly through the source-gen path. This is trimmer-safe and eliminates per-call allocations.

  2. TestSetup.cs — Adds AuditLogService.SetLogDirForTesting(Path.Combine(TestBaseDir, "audit_logs")) to ModuleInitializer, redirecting the static default dir to the per-process temp directory. This matches the isolation guarantee for CopilotService and RepoManager.

No regressions introduced. The source-gen context correctly handles object? values in Details via the registered primitive types.


Verdict: ✅ Approve

All findings from the multi-round review are resolved. The feature is well-tested (19+ tests covering all event types, sanitization patterns, thread safety, retention, and isolation) and the integration points are minimal and safe (optional AuditLog? property with null-conditional fire-and-forget). Ready to merge.

@btessiau btessiau marked this pull request as ready for review March 11, 2026 09:59
@PureWeen PureWeen merged commit f592810 into PureWeen:main Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(codespaces): add audit logging for security and observability

3 participants