Skip to content

fix(windows): build agent as GUI-subsystem to suppress Task Scheduler console flash#101

Open
swarit-stepsecurity wants to merge 1 commit into
step-security:mainfrom
swarit-stepsecurity:swarit/fix/windows-app-exec-option-b
Open

fix(windows): build agent as GUI-subsystem to suppress Task Scheduler console flash#101
swarit-stepsecurity wants to merge 1 commit into
step-security:mainfrom
swarit-stepsecurity:swarit/fix/windows-app-exec-option-b

Conversation

@swarit-stepsecurity
Copy link
Copy Markdown
Member

@swarit-stepsecurity swarit-stepsecurity commented May 22, 2026

Summary

Fix the Windows scheduled-scan console-flash bug by building the agent as a GUI-subsystem binary (-ldflags "-H windowsgui"). Windows cannot allocate a console for GUI-subsystem processes, so Task Scheduler firing the periodic scan no longer flashes a cmd.exe window on the user's desktop. Interactive CLI use is preserved by attaching to the parent shell's console at startup.

This is the "Option B" approach — single-binary, in contrast to the launcher-companion approach in #100.

What changed

Core: agent is now GUI-subsystem on Windows

  • Makefile build-windows / build-windows-arm64 targets append -H windowsgui to ldflags.
  • .goreleaser.yml adds the same flag only for GOOS=windows builds via templated ldflag.
  • cmd/stepsecurity-dev-machine-guard/console_windows.go calls AttachConsole(ATTACH_PARENT_PROCESS) at the top of main() and rewires os.Stdin/Stdout/Stderr to CONIN$ / CONOUT$. Under Task Scheduler the parent has no console and this no-ops cleanly. console_other.go is the non-Windows stub.

Subprocess flash suppression

  • New internal/winproc package with HideWindow(*exec.Cmd). Sets SysProcAttr.HideWindow + CREATE_NO_WINDOW; merges with existing attrs and is idempotent. No-op stub on non-Windows.
  • Applied at every exec.Command* site that spawns Windows subprocesses: executor.Run, executor.RunInDir, executor.IsAppleCLTStub, config_windows.go icacls invocation, aiagents/enrich/npm/registry.go package-manager probes.

Scheduled task action cleanup

  • schtasks.go: dropped the cmd /c "set ""STEPSECURITY_HOME=..."" && \"...\" send-telemetry >> log 2>> err" wrapper. Task action now invokes the agent directly: "<binary>" send-telemetry --install-dir="<path>". The wrapper was a guaranteed console flash per fire.
  • stepHome derived from logDir so it inherits the full fallback chain (admin → C:\ProgramData\StepSecurity, non-admin → paths.Home()CurrentUser.HomeDir, finally ProgramData). Using paths.Home() directly could emit --install-dir="", which the CLI treats as "disable file logging" — an unintended opt-out.
  • The icacls Modify grant on the log dir is preserved so the agent (running as the logged-in interactive user) can still append agent.error.log.

IDE version detection: avoid the .cmd shell-out

  • internal/detector/ide.go: resolveWindowsVersionFromDir reads resources\app\package.json before invoking bin\code.cmd --version. VS Code, Cursor, Windsurf, and Antigravity all ship a top-level version field there in the same JSON shape as JetBrains' product-info.json. Eliminates the per-IDE cmd subprocess that historically flashed a console.
  • Renamed readProductInfoVersionreadJSONVersion since the helper now serves both file shapes.

filelog write-order fix (latent bug surfaced by Option B)

  • internal/progress/filelog/filelog.go: teeLoop previously used io.MultiWriter(origErr, file), which is all-or-nothing — if the first writer errors, the loop bails before the second writer runs. Under Option B's GUI-subsystem agent launched by Task Scheduler, STD_ERROR_HANDLE is INVALID_HANDLE_VALUE; every write to origErr returned ERROR_INVALID_HANDLE and agent.error.log stayed at 0 bytes despite a successful scan. Switched to two best-effort writes (file first, then origErr) so the durable destination always lands. Verified against the backend DeviceExecutionHistory table: same scans uploaded full execution logs (~4 KB) but the local file path was getting dropped. Independent of Option A vs B — also fixes any future code path where origErr becomes invalid.

Tests

Package Tests added
internal/winproc TestHideWindow_NilSafe, TestHideWindow_ZeroValueCmd, TestHideWindow_SetsAttrs, TestHideWindow_MergesExistingAttrs, TestHideWindow_Idempotent
internal/schtasks TestBuildCreateArgs_TaskCommandFormat — regression guard ensuring no cmd /c wrapper, no >> redirection, --install-dir present, binary path quoted
internal/detector TestIDEDetector_Windows_VSCode_PackageJSONFastPath — confirms version comes from package.json without shelling out to code.cmd
internal/progress/filelog TestStartWritesFileEvenWhenOrigStderrIsBroken — locks in the file-first ordering by simulating a closed/invalid origErr

All existing tests pass on darwin native and Windows cross-compile.

Verification

End-to-end on Windows Server 2025 VM:

  • PE subsystem byte of the deployed binary: 2 (Windows GUI).
  • Scheduled task fires every 90 seconds (preview cadence), each scan completes with Last result: 0x0 and agent.error.log grows to ~4.7 KB.
  • No console window appears on the desktop across multiple observed fires.
  • Backend DeviceExecutionHistory records (queried via DDB) show 100/100 successful uploads with non-zero output_size_bytes.

Trade-offs vs Option A (#100)

Option B ships one binary instead of two but accepts CLI ergonomic regressions because GUI-subsystem processes don't behave like console CLIs:

  • Pipes don't work — agent.exe scan --json | jq produces no output (stdout is a console handle, not a pipe).
  • Shell prompt returns before the agent exits — output streams asynchronously below the next command.
  • $LASTEXITCODE / %ERRORLEVEL% is unreliable in scripts unless wrapped in Start-Process -Wait.
  • agent.exe configure interactive prompts can't read stdin; non-interactive flags or --from-file required.
  • agent.exe --version and 2> redirects don't work as expected without Start-Process -Wait.
  • CI/CD output capture is awkward; standard "run with --verbose and send me the log" support flow breaks.
  • Diverges from industry convention (Trivy, osquery, MpCmdRun, Defender CLI are all console-subsystem).

The flash-fix property is identical to Option A. Choose Option B if the simpler deployment footprint (one binary, one MSI <File> component, no companion launcher) outweighs the interactive-CLI loss. Choose Option A if customers run the binary from a terminal regularly.

Build the agent as GUI-subsystem (-ldflags "-H windowsgui") so Task
Scheduler can't allocate a console for it under /ru INTERACTIVE.
AttachConsole(ATTACH_PARENT_PROCESS) at the top of main() restores
os.Std* for interactive `agent.exe` runs from cmd/PowerShell; under
Task Scheduler the parent has no console and this no-ops, preserving
silent operation.

Known ergonomic trade-offs for interactive use (documented in
console_windows.go): the shell prompt returns immediately while output
streams async, stdout pipes do not work (reattached handle is a console
not a pipe), and $LASTEXITCODE is unreliable without Start-Process -Wait.

Companion changes (independent of subsystem choice):
- internal/winproc.HideWindow applied at every subprocess site
  (executor.Run, executor.RunInDir, config_windows icacls,
  aiagents/enrich/npm registry probes). Subprocess flashes are
  orthogonal to the agent's subsystem.
- internal/schtasks: dropped the `cmd /c "... >>log 2>>err"` wrapper.
  Task action now invokes the binary directly with --install-dir.
  stepHome derived from logDir (ProgramData fallback) so it's
  never empty.
- internal/detector/ide: VS Code-family resolveWindowsVersionFromDir
  reads resources\app\package.json before invoking bin\*.cmd. Renamed
  readProductInfoVersion -> readJSONVersion since the helper now
  serves both shapes.
- internal/progress/filelog: teeLoop writes to file before origErr.
  io.MultiWriter aborts on the first error, so a broken origErr
  (GUI-subsystem agent with no parent console) used to drop the file
  write entirely, leaving agent.error.log at 0 bytes. Test
  TestStartWritesFileEvenWhenOrigStderrIsBroken locks in the fix.

Tests:
- internal/winproc: nil-safety, flag merge, idempotence on Windows.
- internal/schtasks: TaskCommandFormat regression guard.
- internal/detector: package.json fast-path precedence.
- internal/progress/filelog: broken-origErr regression guard.

Build:
- Makefile build-windows / build-windows-arm64: -H windowsgui.
- .goreleaser.yml: templated ldflag adds -H windowsgui for windows only.

.gitignore: explicit paths for the compiled binary at both the repo
root and the same-named source dir under cmd/. The previous
`**/stepsecurity-dev-machine-guard` pattern matched the source
directory too and silently dropped new files added inside it.

Signed-off-by: Swarit Pandey <swarit@stepsecurity.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Builds the Windows agent as a GUI-subsystem binary to eliminate Task Scheduler “console flash”, while attempting to preserve interactive stdio behavior and suppress console windows for child processes.

Changes:

  • Windows builds now use -H windowsgui, with a startup console-attach shim to restore stdio for interactive runs.
  • Introduces internal/winproc.HideWindow(*exec.Cmd) and applies it across subprocess call sites to avoid child-process window flashes.
  • Updates Windows scheduled task creation to invoke the agent directly (no cmd /c wrapper), plus adjusts Windows IDE version detection and fixes filelog write ordering.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Makefile Adds -H windowsgui to Windows build targets.
.goreleaser.yml Adds Windows-only ldflag to build GUI-subsystem binaries.
cmd/stepsecurity-dev-machine-guard/main.go Calls AttachParentConsole() early in startup.
cmd/stepsecurity-dev-machine-guard/console_windows.go Implements parent-console attach and stdio rewiring on Windows.
cmd/stepsecurity-dev-machine-guard/console_other.go Non-Windows no-op stub for AttachParentConsole().
internal/winproc/winproc.go Non-Windows no-op stub for HideWindow.
internal/winproc/winproc_windows.go Windows implementation setting HideWindow + CREATE_NO_WINDOW.
internal/winproc/winproc_windows_test.go Windows-specific tests for HideWindow behavior.
internal/winproc/winproc_test.go Cross-platform tests for nil/zero-value safety.
internal/executor/executor.go Applies winproc.HideWindow to spawned commands.
internal/config/config_windows.go Uses winproc.HideWindow for icacls subprocess.
internal/aiagents/enrich/npm/registry.go Applies winproc.HideWindow to package-manager probe subprocesses.
internal/schtasks/schtasks.go Removes cmd /c wrapper; task runs agent directly with --install-dir.
internal/schtasks/schtasks_test.go Updates tests and adds regression guard for /tr formatting.
internal/detector/ide.go Reads Windows IDE versions from package.json before .cmd shell-out; renames helper to readJSONVersion.
internal/detector/ide_test.go Adds regression test for VS Code package.json fast path.
internal/progress/filelog/filelog.go Writes to file first, then original stderr (best-effort) to avoid losing file output.
internal/progress/filelog/filelog_test.go Adds regression test for “broken orig stderr” scenario.
.gitignore Adjusts ignored Go binaries and ignores **/.stepsecurity/ runtime state.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +126 to 132
// buildCreateArgs returns the schtasks /create args. Task invokes the
// binary directly — the previous `cmd /c` wrapper flashed a console
// on every fire; --install-dir + filelog handle what it did.
func buildCreateArgs(binaryPath, stepHome string, hours int, isAdmin bool) []string {
taskCmd := fmt.Sprintf(`"%s" send-telemetry --install-dir="%s"`, binaryPath, stepHome)
args := []string{"/create", "/tn", taskName, "/tr", taskCmd,
"/sc", "HOURLY", "/mo", strconv.Itoa(hours), "/f"}
Comment on lines +394 to +399
vscodePath := `C:\Program Files\Microsoft VS Code`
mock.SetDir(vscodePath)
mock.SetFile(vscodePath+`/bin\code.cmd`, []byte{})
mock.SetFile(vscodePath+`/resources/app/package.json`,
[]byte(`{"name":"Code","version":"1.115.0"}`))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants