fix(windows): build agent as GUI-subsystem to suppress Task Scheduler console flash#101
Open
swarit-stepsecurity wants to merge 1 commit into
Conversation
Build the agent as GUI-subsystem (-ldflags "-H windowsgui") so Task Scheduler can't allocate a console for it under /ru INTERACTIVE. AttachConsole(ATTACH_PARENT_PROCESS) at the top of main() restores os.Std* for interactive `agent.exe` runs from cmd/PowerShell; under Task Scheduler the parent has no console and this no-ops, preserving silent operation. Known ergonomic trade-offs for interactive use (documented in console_windows.go): the shell prompt returns immediately while output streams async, stdout pipes do not work (reattached handle is a console not a pipe), and $LASTEXITCODE is unreliable without Start-Process -Wait. Companion changes (independent of subsystem choice): - internal/winproc.HideWindow applied at every subprocess site (executor.Run, executor.RunInDir, config_windows icacls, aiagents/enrich/npm registry probes). Subprocess flashes are orthogonal to the agent's subsystem. - internal/schtasks: dropped the `cmd /c "... >>log 2>>err"` wrapper. Task action now invokes the binary directly with --install-dir. stepHome derived from logDir (ProgramData fallback) so it's never empty. - internal/detector/ide: VS Code-family resolveWindowsVersionFromDir reads resources\app\package.json before invoking bin\*.cmd. Renamed readProductInfoVersion -> readJSONVersion since the helper now serves both shapes. - internal/progress/filelog: teeLoop writes to file before origErr. io.MultiWriter aborts on the first error, so a broken origErr (GUI-subsystem agent with no parent console) used to drop the file write entirely, leaving agent.error.log at 0 bytes. Test TestStartWritesFileEvenWhenOrigStderrIsBroken locks in the fix. Tests: - internal/winproc: nil-safety, flag merge, idempotence on Windows. - internal/schtasks: TaskCommandFormat regression guard. - internal/detector: package.json fast-path precedence. - internal/progress/filelog: broken-origErr regression guard. Build: - Makefile build-windows / build-windows-arm64: -H windowsgui. - .goreleaser.yml: templated ldflag adds -H windowsgui for windows only. .gitignore: explicit paths for the compiled binary at both the repo root and the same-named source dir under cmd/. The previous `**/stepsecurity-dev-machine-guard` pattern matched the source directory too and silently dropped new files added inside it. Signed-off-by: Swarit Pandey <swarit@stepsecurity.io>
1d9c803 to
bf8e639
Compare
There was a problem hiding this comment.
Pull request overview
Builds the Windows agent as a GUI-subsystem binary to eliminate Task Scheduler “console flash”, while attempting to preserve interactive stdio behavior and suppress console windows for child processes.
Changes:
- Windows builds now use
-H windowsgui, with a startup console-attach shim to restore stdio for interactive runs. - Introduces
internal/winproc.HideWindow(*exec.Cmd)and applies it across subprocess call sites to avoid child-process window flashes. - Updates Windows scheduled task creation to invoke the agent directly (no
cmd /cwrapper), plus adjusts Windows IDE version detection and fixes filelog write ordering.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| Makefile | Adds -H windowsgui to Windows build targets. |
| .goreleaser.yml | Adds Windows-only ldflag to build GUI-subsystem binaries. |
| cmd/stepsecurity-dev-machine-guard/main.go | Calls AttachParentConsole() early in startup. |
| cmd/stepsecurity-dev-machine-guard/console_windows.go | Implements parent-console attach and stdio rewiring on Windows. |
| cmd/stepsecurity-dev-machine-guard/console_other.go | Non-Windows no-op stub for AttachParentConsole(). |
| internal/winproc/winproc.go | Non-Windows no-op stub for HideWindow. |
| internal/winproc/winproc_windows.go | Windows implementation setting HideWindow + CREATE_NO_WINDOW. |
| internal/winproc/winproc_windows_test.go | Windows-specific tests for HideWindow behavior. |
| internal/winproc/winproc_test.go | Cross-platform tests for nil/zero-value safety. |
| internal/executor/executor.go | Applies winproc.HideWindow to spawned commands. |
| internal/config/config_windows.go | Uses winproc.HideWindow for icacls subprocess. |
| internal/aiagents/enrich/npm/registry.go | Applies winproc.HideWindow to package-manager probe subprocesses. |
| internal/schtasks/schtasks.go | Removes cmd /c wrapper; task runs agent directly with --install-dir. |
| internal/schtasks/schtasks_test.go | Updates tests and adds regression guard for /tr formatting. |
| internal/detector/ide.go | Reads Windows IDE versions from package.json before .cmd shell-out; renames helper to readJSONVersion. |
| internal/detector/ide_test.go | Adds regression test for VS Code package.json fast path. |
| internal/progress/filelog/filelog.go | Writes to file first, then original stderr (best-effort) to avoid losing file output. |
| internal/progress/filelog/filelog_test.go | Adds regression test for “broken orig stderr” scenario. |
| .gitignore | Adjusts ignored Go binaries and ignores **/.stepsecurity/ runtime state. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+126
to
132
| // buildCreateArgs returns the schtasks /create args. Task invokes the | ||
| // binary directly — the previous `cmd /c` wrapper flashed a console | ||
| // on every fire; --install-dir + filelog handle what it did. | ||
| func buildCreateArgs(binaryPath, stepHome string, hours int, isAdmin bool) []string { | ||
| taskCmd := fmt.Sprintf(`"%s" send-telemetry --install-dir="%s"`, binaryPath, stepHome) | ||
| args := []string{"/create", "/tn", taskName, "/tr", taskCmd, | ||
| "/sc", "HOURLY", "/mo", strconv.Itoa(hours), "/f"} |
Comment on lines
+394
to
+399
| vscodePath := `C:\Program Files\Microsoft VS Code` | ||
| mock.SetDir(vscodePath) | ||
| mock.SetFile(vscodePath+`/bin\code.cmd`, []byte{}) | ||
| mock.SetFile(vscodePath+`/resources/app/package.json`, | ||
| []byte(`{"name":"Code","version":"1.115.0"}`)) | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix the Windows scheduled-scan console-flash bug by building the agent as a GUI-subsystem binary (
-ldflags "-H windowsgui"). Windows cannot allocate a console for GUI-subsystem processes, so Task Scheduler firing the periodic scan no longer flashes acmd.exewindow on the user's desktop. Interactive CLI use is preserved by attaching to the parent shell's console at startup.This is the "Option B" approach — single-binary, in contrast to the launcher-companion approach in #100.
What changed
Core: agent is now GUI-subsystem on Windows
Makefilebuild-windows / build-windows-arm64 targets append-H windowsguito ldflags..goreleaser.ymladds the same flag only forGOOS=windowsbuilds via templated ldflag.cmd/stepsecurity-dev-machine-guard/console_windows.gocallsAttachConsole(ATTACH_PARENT_PROCESS)at the top ofmain()and rewiresos.Stdin/Stdout/StderrtoCONIN$/CONOUT$. Under Task Scheduler the parent has no console and this no-ops cleanly.console_other.gois the non-Windows stub.Subprocess flash suppression
internal/winprocpackage withHideWindow(*exec.Cmd). SetsSysProcAttr.HideWindow+CREATE_NO_WINDOW; merges with existing attrs and is idempotent. No-op stub on non-Windows.exec.Command*site that spawns Windows subprocesses:executor.Run,executor.RunInDir,executor.IsAppleCLTStub,config_windows.goicacls invocation,aiagents/enrich/npm/registry.gopackage-manager probes.Scheduled task action cleanup
schtasks.go: dropped thecmd /c "set ""STEPSECURITY_HOME=..."" && \"...\" send-telemetry >> log 2>> err"wrapper. Task action now invokes the agent directly:"<binary>" send-telemetry --install-dir="<path>". The wrapper was a guaranteed console flash per fire.stepHomederived fromlogDirso it inherits the full fallback chain (admin →C:\ProgramData\StepSecurity, non-admin →paths.Home()→CurrentUser.HomeDir, finally ProgramData). Usingpaths.Home()directly could emit--install-dir="", which the CLI treats as "disable file logging" — an unintended opt-out.agent.error.log.IDE version detection: avoid the .cmd shell-out
internal/detector/ide.go:resolveWindowsVersionFromDirreadsresources\app\package.jsonbefore invokingbin\code.cmd --version. VS Code, Cursor, Windsurf, and Antigravity all ship a top-levelversionfield there in the same JSON shape as JetBrains'product-info.json. Eliminates the per-IDE cmd subprocess that historically flashed a console.readProductInfoVersion→readJSONVersionsince the helper now serves both file shapes.filelog write-order fix (latent bug surfaced by Option B)
internal/progress/filelog/filelog.go:teeLooppreviously usedio.MultiWriter(origErr, file), which is all-or-nothing — if the first writer errors, the loop bails before the second writer runs. Under Option B's GUI-subsystem agent launched by Task Scheduler,STD_ERROR_HANDLEisINVALID_HANDLE_VALUE; every write toorigErrreturnedERROR_INVALID_HANDLEandagent.error.logstayed at 0 bytes despite a successful scan. Switched to two best-effort writes (file first, then origErr) so the durable destination always lands. Verified against the backendDeviceExecutionHistorytable: same scans uploaded full execution logs (~4 KB) but the local file path was getting dropped. Independent of Option A vs B — also fixes any future code path whereorigErrbecomes invalid.Tests
internal/winprocTestHideWindow_NilSafe,TestHideWindow_ZeroValueCmd,TestHideWindow_SetsAttrs,TestHideWindow_MergesExistingAttrs,TestHideWindow_Idempotentinternal/schtasksTestBuildCreateArgs_TaskCommandFormat— regression guard ensuring nocmd /cwrapper, no>>redirection,--install-dirpresent, binary path quotedinternal/detectorTestIDEDetector_Windows_VSCode_PackageJSONFastPath— confirms version comes frompackage.jsonwithout shelling out tocode.cmdinternal/progress/filelogTestStartWritesFileEvenWhenOrigStderrIsBroken— locks in the file-first ordering by simulating a closed/invalid origErrAll existing tests pass on darwin native and Windows cross-compile.
Verification
End-to-end on Windows Server 2025 VM:
2(Windows GUI).Last result: 0x0andagent.error.loggrows to ~4.7 KB.DeviceExecutionHistoryrecords (queried via DDB) show 100/100 successful uploads with non-zerooutput_size_bytes.Trade-offs vs Option A (#100)
Option B ships one binary instead of two but accepts CLI ergonomic regressions because GUI-subsystem processes don't behave like console CLIs:
agent.exe scan --json | jqproduces no output (stdout is a console handle, not a pipe).$LASTEXITCODE/%ERRORLEVEL%is unreliable in scripts unless wrapped inStart-Process -Wait.agent.exe configureinteractive prompts can't read stdin; non-interactive flags or--from-filerequired.agent.exe --versionand2>redirects don't work as expected withoutStart-Process -Wait.--verboseand send me the log" support flow breaks.The flash-fix property is identical to Option A. Choose Option B if the simpler deployment footprint (one binary, one MSI
<File>component, no companion launcher) outweighs the interactive-CLI loss. Choose Option A if customers run the binary from a terminal regularly.