Skip to content

implement posix command tail#25

Draft
julesmcrt wants to merge 8 commits intomainfrom
jules.macret/builtins-tail
Draft

implement posix command tail#25
julesmcrt wants to merge 8 commits intomainfrom
jules.macret/builtins-tail

Conversation

@julesmcrt
Copy link
Collaborator

@julesmcrt julesmcrt commented Mar 10, 2026

What does this PR do?

Implements the POSIX tail command as a builtin in the safe shell interpreter. Supports flags -n (lines), -c (bytes), -q/--quiet/--silent, -v/--verbose, -z/--zero-terminated, and -h/--help. Both tail mode (last N lines/bytes) and offset mode (+N: start from line/byte N) are supported. Reads from stdin when no file is given or when file is -. Prints ==> filename <== headers for 2+ files (matching GNU tail behaviour).

Memory-safe: line mode uses a ring buffer capped at min(N, MaxRingLines) slots; byte mode uses a circular buffer capped at min(N, MaxBytesBuffer) (32 MiB). All loops respect context cancellation.

Motivation

Expands the set of available builtins so shell scripts can extract the last portion of files without external binaries. The -f/--follow, --retry, and --pid flags are intentionally rejected: file-following requires indefinite blocking and background goroutines, which are unsafe in a sandboxed shell.

Testing

  • 19 YAML scenario tests across tests/scenarios/cmd/tail/ (lines, bytes, headers, stdin, errors, offset, zero-terminated)
  • 50+ Go unit tests in interp/builtins/tail/tail_test.go covering all flags, edge cases (empty file, no trailing newline, CRLF, binary/null bytes, bad UTF-8, nil stdin, context cancellation, pipe input, -- separator)
  • GNU compatibility tests in tail_gnu_compat_test.go with documented reference output from gtail
  • Unix-specific tests (symlinks, dangling symlinks, /dev/null, permission denied) with //go:build unix
  • Windows-specific tests (reserved device names: CON, NUL, COM1, …) with //go:build windows
  • Exploratory pentest tests in builtin_tail_pentest_test.go: integer overflow/underflow, ring buffer wrap correctness, flag injection via variable expansion, path traversal, sandbox escape, long lines (at/beyond 1 MiB cap), 1M-line file, 210-file FD leak check
  • Import allowlist test updated and passing
  • All tests pass: go test -race ./interp/builtins/tail/... ./interp/... ./tests/...

Checklist

  • Tests added/updated
  • Documentation updated (SHELL_FEATURES.md)

@julesmcrt
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

@AlexandreYang
Copy link
Member

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4bcbd354b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return err
}
}
_, err := io.Copy(callCtx.Stdout, r)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Check context while streaming byte-offset output

After skipCount bytes are discarded, this path switches to io.Copy without any ctx.Err() checks, so tail -c +N can ignore runner cancellation/timeouts while copying the remainder of a large or unbounded input (for example, piped streams that never reach EOF). That breaks the shell’s cancellation guarantees and can leave executions stuck until the reader closes.

Useful? React with 👍 / 👎.

Comment on lines +145 to +147
printHeaders := len(files) > 1 || *verbose
if *quiet {
printHeaders = false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect flag order between --quiet and --verbose

This logic makes --quiet/-q dominate unconditionally whenever it is present, so tail -q -v ... still suppresses headers. GNU/POSIX-style CLI behavior is order-sensitive for conflicting toggles, and users expect the later flag to win (e.g., -q -v should force headers, while -v -q should suppress them).

Useful? React with 👍 / 👎.

- tailBytesOffset: replace io.Copy with a chunked read loop that checks
  ctx.Err() on every iteration, preventing the shell from getting stuck
  on unbounded piped input after cancellation (P1 fix).

- -q/--quiet/--silent and -v/--verbose: replace the unconditional
  quiet-wins logic with a shared sequence counter (headerFlag type,
  mirroring modeFlag) so that the last flag on the command line wins,
  e.g. "-q -v" now shows headers while "-v -q" suppresses them (P2 fix).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@julesmcrt
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0be3461f64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +246 to +248
if count == 0 {
_, err := io.Copy(io.Discard, r)
return err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor context cancellation in -n 0 fast path

When count == 0, tailLines switches to io.Copy(io.Discard, r) without checking ctx.Err(), so a non-terminating input (for example an endless pipe or /dev/zero) will ignore shell cancellation/timeouts and can run indefinitely. This violates the command’s own cancellation guarantees and creates a safety regression specifically for tail -n 0 (and the same pattern appears in tailBytes at lines 329-331).

Useful? React with 👍 / 👎.

Comment on lines +216 to +217
if idx > 0 {
callCtx.Out("\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Print file separators only after emitted headers

The separator newline is currently keyed to argument index (idx > 0) rather than whether a previous header was actually printed. If an earlier file fails to open, the next successful file still gets a leading blank line before its header (e.g. tail missing.txt ok.txt), which is not how GNU tail formats mixed success/error multi-file output. Track printed-header count instead of raw index so separators only appear between emitted file blocks.

Useful? React with 👍 / 👎.

julesmcrt added a commit that referenced this pull request Mar 10, 2026
@julesmcrt
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98763e685d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Allocate the circular buffer eagerly. bufSize is capped at MaxBytesBuffer
// (32 MiB), so this allocation is bounded regardless of the user-supplied
// count value.
bufSize := int(min(count, int64(MaxBytesBuffer)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor large byte counts instead of truncating to 32 MiB

readLastBytes hard-caps its circular buffer to MaxBytesBuffer via min(count, MaxBytesBuffer), so any tail -c N where N > 32 MiB can never return more than 32 MiB even when the file is smaller than N (e.g., a 40 MiB file with -c 50M should return the full 40 MiB, but this code returns only the last 32 MiB). This silently drops requested data rather than failing explicitly, which can corrupt downstream processing.

Useful? React with 👍 / 👎.

sc.Split(scanLinesPreservingNewline)
}

ringSize := int(min(count, int64(MaxRingLines)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor large line counts instead of truncating at 100000

readLastLines limits the ring to MaxRingLines regardless of the user-requested -n value, so tail -n N with N > 100000 can never emit more than 100000 lines even when more lines were requested and available (for example, -n 200000 on a 150000-line file should output all 150000 lines). This is a silent correctness regression for large logs.

Useful? React with 👍 / 👎.

}
// GNU tail silently treats negative counts as their absolute value.
if n < 0 {
n = -n

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard MinInt64 before negating negative counts

In parseCount, negating a parsed negative value with n = -n overflows for -9223372036854775808 and leaves n negative; downstream code then uses that negative count to size buffers/rings, which can panic (makeslice: len out of range) instead of returning a normal usage error. A user-provided count should not be able to trigger an internal panic path.

Useful? React with 👍 / 👎.

julesmcrt and others added 2 commits March 10, 2026 17:14
Previously the blank-line separator before each file header was printed
when idx > 0, so a missing first file still caused a spurious blank line
before the next successful file's header.

Replace the idx int parameter in processFile with headersEmitted *int:
the separator and header are now printed together, and headersEmitted is
incremented only when a header is actually emitted. The new scenario
tests/scenarios/cmd/tail/errors/missing_then_ok.yaml covers this case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- parseCount: guard MinInt64 before negating negative counts.
  n = -n overflows when n == math.MinInt64, which could cause a
  downstream panic when sizing buffers. Clamp to MaxCount first.

- MaxRingLines: restore to 1<<20 (1M) from the regressed 100K.
  With 100K the ring silently truncated tail -n N where N > 100000
  (e.g. "-n 200000" on a 150000-line file returned only 100000 lines).
  Memory is still bounded by MaxRingBytes (64 MiB) on content.

- MaxBytesBuffer: raise to MaxTotalReadBytes (256 MiB).
  The 32 MiB cap caused tail -c N on a file smaller than N but larger
  than 32 MiB to silently return only the last 32 MiB. Since we already
  error when total input exceeds MaxTotalReadBytes, matching the two
  constants ensures correct output for any readable input.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@julesmcrt
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7f9368b7b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +498 to +503
if n < 0 {
if n < -MaxCount {
n = MaxCount
} else {
n = -n
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject +-N counts instead of normalizing them

In parseCount, negative values are converted to absolute values even after a leading + was detected, so tail -n +-2 / tail -c +-2 are treated as +2 instead of being rejected. GNU tail reports these as invalid counts (e.g. invalid number of lines: '+-2'), so this silently accepts malformed input and can produce unexpected output when users typo an offset. Restrict the negative-to-absolute normalization to non-offset mode.

Useful? React with 👍 / 👎.

julesmcrt and others added 2 commits March 10, 2026 17:42
GNU tail treats "tail -n +-2" as invalid ("invalid number of lines:
'+-2'"). Previously the negative-to-absolute normalization also ran in
offset mode, silently accepting "+-2" as "+2". The fix gates that path
on !isOffset so malformed offset strings are rejected correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants