Skip to content

Implement head builtin command#54

Open
thieman wants to merge 3 commits intothieman/remove-head-commandfrom
thieman/add-head-command
Open

Implement head builtin command#54
thieman wants to merge 3 commits intothieman/remove-head-commandfrom
thieman/add-head-command

Conversation

@thieman
Copy link
Collaborator

@thieman thieman commented Mar 11, 2026

Summary

  • Adds head as a safe builtin command with full GNU head compatibility
  • Supports -n N/--lines, -c N/--bytes, -q/--quiet/--silent, -v/--verbose, multiple files with ==> filename <== headers, and stdin via - or implicit
  • Memory-safe chunked I/O; large N values clamped to prevent allocations; context cancellation checked per iteration

Test plan

  • 35 YAML scenario tests across lines/, bytes/, headers/, errors/, stdin/, and hardening/ subdirectories
  • All scenarios validated byte-for-byte against GNU bash + coreutils in Docker (RSHELL_BASH_TEST=1 go test ./tests/ -run TestShellScenariosAgainstBash)
  • Go unit tests, GNU compat tests, and pentest hardening tests in interp/builtins/head/
  • go test ./interp/... ./tests/... passes locally

🤖 Generated with Claude Code

Adds `head` as a safe builtin with support for:
- `-n N` / `--lines=N`: output first N lines (default 10)
- `-c N` / `--bytes=N`: output first N bytes
- `-q` / `--quiet` / `--silent`: suppress file headers
- `-v` / `--verbose`: always print file headers
- Multiple files with `==> filename <==` headers
- stdin via `-` or implicit when no files given
- Memory-safe chunked I/O; large N values clamped to prevent allocations
- 35 YAML scenario tests covering lines, bytes, headers, errors, stdin, and hardening

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman
Copy link
Collaborator Author

thieman commented Mar 11, 2026

@DataDog @codex make a comprehensive code and security reviews

@datadog-datadog-prod-us1
Copy link

I can only run on private repositories.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 001f4f1019

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Collaborator Author

@thieman thieman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Reviewed the head builtin implementation. The code is well-structured, memory-safe, passes through the sandbox correctly (callCtx.OpenFile throughout, never os.Open), handles resource exhaustion properly (1 MiB line cap, 32 KB chunk reads, count clamping), and has excellent test coverage for the happy path.

Overall assessment: needs fixes — two P2 issues found:

# Priority File Finding
1 P2 Badge interp/builtins/head/head.go:189 Spurious leading \n separator when first file fails to open but a later file succeeds
2 P2 Badge SHELL_FEATURES.md:16 -z listed in synopsis but implementation rejects it (unlike tail which actually supports -z)
3 P3 Badge tests/scenarios/cmd/head/ Missing scenario test for head missing.txt good.txt (first-file-fails-second-succeeds with headers)

Positive observations

  • callCtx.OpenFile is used exclusively — no direct os.Open calls, sandbox is intact ✅
  • Count clamping (MaxCount = 1<<31-1) prevents allocation attacks ✅
  • Per-line 1 MiB cap in scanner mode prevents line-length DoS ✅
  • Context cancellation checked at every loop iteration ✅
  • bufio.Scanner initial buffer (4 KiB) doesn't allocate proportionally to user input ✅
  • All imports are in the symbol allowlist; no new allowlist entries needed ✅
  • scanLinesPreservingNewline correctly preserves exact bytes (CRLF, missing final newline, null bytes) ✅
  • last-flag-wins logic via modeSeq counter is elegant and avoids raw arg scanning ✅
  • Comprehensive test suite: Go unit tests, GNU compat tests, pentest tests, Unix/Windows platform tests, and YAML scenarios ✅

…-z docs

Three issues fixed:

1. Spurious leading \n separator when first file fails to open (P2 bash compat):
   Replace idx-based separator logic with prevHeaderPrinted bool. The separator
   is now only printed when a previous file was successfully processed, matching
   GNU head behavior (head missing.txt good.txt no longer produces a leading \n).

2. -q/-v last-flag-wins semantics (P1 bash compat, raised by @codex review):
   Replace *bool quiet/verbose with boolSeqFlag that shares a sequence counter.
   The flag with the highest pos wins, so "head -q -v file" prints headers
   (-v wins) and "head -v -q file" suppresses them (-q wins), matching GNU head.
   --silent shares the same counter as --quiet and acts as an alias.
   Uses pflag's NoOptDefVal="true" so the flags remain no-argument booleans.

3. SHELL_FEATURES.md incorrectly listed -z in the head synopsis (P2 docs):
   Remove -z from the synopsis and add a note that -z/--zero-terminated and
   --follow are rejected (unlike tail which actually implements -z).

Add test scenarios: first_fails_second_succeeds, last_flag_wins_verbose,
last_flag_wins_quiet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman
Copy link
Collaborator Author

thieman commented Mar 11, 2026

@DataDog @codex make a comprehensive code and security reviews

@datadog-prod-us1-3
Copy link

I can only run on private repositories.

Copy link
Collaborator Author

@thieman thieman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — head builtin (PR #54)

Reviewed by: Claude Sonnet 4.6 (self-review iteration 2)

Overall Assessment: ✅ Safe to merge

This PR implements the head builtin command for the restricted shell interpreter. The implementation is well-architected, security-conscious, and thoroughly tested. All critical invariants are upheld:

  • Sandbox integrity: All file access goes through callCtx.OpenFile() — no direct os.Open calls.
  • Memory safety: Line scanner capped at 1 MiB per line; byte mode reads in fixed 32 KB chunks; no allocation proportional to user-supplied count.
  • Context cancellation: All read loops check ctx.Err() at each iteration.
  • Last-flag-wins semantics: Correctly implemented via shared sequence counters for both -n/-c and -q/-v flags.
  • Boolean flag parsing: NoOptDefVal = "true" correctly tells pflag not to consume the next argument as the flag's value.
  • Intentional divergences from bash (-n -N elide-tail, -c -N trailing bytes, --follow, -z) are properly marked with skip_assert_against_bash: true.

Only P3 (nice-to-have) findings:

# Priority File Finding
1 P3 interp/builtins/head/head.go:338 IsBoolFlag() is dead code — pflag uses NoOptDefVal for custom Var flags
2 P3 tests/scenarios/cmd/head/errors/ No YAML scenario for head -c -N intentional divergence

- Remove boolSeqFlag.IsBoolFlag() which pflag never calls for VarP/VarPF
  flags. Add a comment explaining that NoOptDefVal = "true" is the actual
  mechanism. This prevents future readers from thinking IsBoolFlag() is
  the active mechanism and accidentally removing the NoOptDefVal lines.

- Add tests/scenarios/cmd/head/errors/negative_bytes_count.yaml to
  document the intentional rejection of head -c -N (elide-tail mode for
  bytes). Marked with skip_assert_against_bash: true since bash supports
  head -c -N but we intentionally do not implement it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4d1bb312f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +225 to +228
sc := bufio.NewScanner(r)
buf := make([]byte, 4096)
sc.Buffer(buf, MaxLineBytes)
sc.Split(scanLinesPreservingNewline)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid dropping stdin data across repeated '-' operands

In line mode, readLines uses a new bufio.Scanner for each file argument and stops after count tokens, but Scanner can prefetch beyond the emitted lines; when stdin is reused via multiple - operands (e.g. head -q -n 1 - - < file), the first pass can consume buffered bytes that the second - should read, so the second operand sees EOF and data is lost. GNU head outputs the next line in this regular-file-stdin case (alpha\nbeta\n), so this breaks bash compatibility and can truncate output in multi-operand scripts.

Useful? React with 👍 / 👎.

Comment on lines +165 to +168
} else if printHeaders {
// A header was successfully printed for this file; subsequent
// files should emit a blank-line separator before their header.
printedHeader = true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Preserve header separators after per-file read errors

printedHeader is only set when processFile returns success, but headers are printed before reading starts; if a file opens and prints its header and then fails during read (for example a line-mode scanner error on an over-cap line), the next file header is emitted without the required blank-line separator because printedHeader remains false. This produces malformed multi-file output in error paths even though a prior header was already written.

Useful? React with 👍 / 👎.

Comment on lines +332 to +335
func (f *boolSeqFlag) Set(_ string) error {
*f.seq++
f.pos = *f.seq
return nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject boolean arguments for --quiet/--silent/--verbose

boolSeqFlag.Set ignores the provided value string, so forms like --quiet=false or --verbose=false are accepted and still participate in last-flag-wins precedence instead of being rejected. GNU head treats these as invalid (option '--quiet' doesn't allow an argument), so this silently changes behavior for scripts that pass explicit boolean values and expect either an error or no effect.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant