Skip to content

Implement tr builtin command#35

Open
AlexandreYang wants to merge 37 commits intomainfrom
dd/implement-tr-builtin
Open

Implement tr builtin command#35
AlexandreYang wants to merge 37 commits intomainfrom
dd/implement-tr-builtin

Conversation

@AlexandreYang
Copy link
Member

@AlexandreYang AlexandreYang commented Mar 10, 2026

What does this PR do?

Implements the POSIX tr command as a safe builtin in the restricted shell interpreter. The command translates, squeezes, and/or deletes characters from standard input.

Supported flags: -d (delete), -s (squeeze-repeats), -c/-C (complement), -t (truncate-set1), -h (help).

Full set notation support: ranges (a-z), character classes ([:alnum:], [:lower:], etc.), equivalence classes ([=c=]), repeat constructs ([c*n], [c*]), and backslash escapes (\n, \t, \NNN).

Motivation

Extends the shell's builtin command set with tr, a commonly used text processing utility. The implementation operates on bytes (0–255) using a 256-entry lookup table and streams input in fixed 32 KiB chunks, ensuring bounded memory usage and safe handling of infinite sources.

Testing

  • 84+ Go tests across unit tests, GNU compatibility tests, and pentest scenarios
  • 19 YAML shell scenario tests covering translate, delete, squeeze, character classes, errors, and hardening
  • Import allowlist test passes
  • All scenario tests pass
  • go vet clean

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)

PR by Bits
View session in Datadog

Comment @DataDog to request changes

AlexandreYang and others added 2 commits March 10, 2026 14:46
This reverts commit 06b934e.
Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
@datadog-prod-us1-5
Copy link

datadog-prod-us1-5 bot commented Mar 10, 2026

View session in Datadog

Bits Dev status: ✅ Done

CI Auto-fix: Disabled | Enable

Comment @DataDog to request changes

@AlexandreYang
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab1acae556

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@AlexandreYang AlexandreYang changed the base branch from with_resources to main March 10, 2026 20:55
This reverts commit f73a022.
@AlexandreYang
Copy link
Member Author

@codex make a comprehensive code review

@AlexandreYang
Copy link
Member Author

@DataDog make a comprehensive code review

@datadog-datadog-prod-us1
Copy link

Session canceled.

@AlexandreYang
Copy link
Member Author

@DataDog make a comprehensive code review

Resolve conflict in register_builtins.go (both sides added new builtins)
and update tr builtin to use the new MakeFlags API introduced on main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee26051f37

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@datadog-datadog-prod-us1-2
Copy link

I made changes but couldn't push them because the branch was updated while I was working. Please check the session to see my changes.

Summary

Comprehensive code review of the tr builtin implementation. Found and fixed 3 issues across the implementation.

  • interp/builtins/tr/tr.go line 109-117: Added missing validation for tr -ds with only one operand — previously silently succeeded with an empty squeeze set instead of erroring like GNU tr. Now correctly rejects with "Two strings must be given when both deleting and squeezing repeats."
  • interp/builtins/tr/tr.go line 81: Changed -C flag registration from BoolP("COMPLEMENT", "C", ...) to BoolVarP(&bigC, "complement-alt", "C", ...) with MarkHidden. The previous code exposed a spurious --COMPLEMENT long-form flag that GNU tr doesn't support.
  • interp/builtins/tr/tr.go line 742-744: Fixed rptErrMsg to return "invalid repeat construct" instead of "" for the empty-countStr code path, preventing a potential blank error message.
  • interp/builtins/tr/tr_test.go: Added TestTrDeleteSqueezeOneOperand covering the -ds one-operand bug.
  • tests/scenarios/cmd/tr/errors/delete_squeeze_one_operand.yaml: Added YAML scenario for the same case.

Root Cause & Fix

The primary bug was in operand validation logic. The check at line 109 (*deleteFlag && !*squeeze && len(operands) > 1) only validated the -d (delete-only) case. When both -d and -s were set together, no check enforced the required second operand. The code fell through to deleteAndSqueeze() with a nil squeeze set, silently producing delete-only behavior. Fixed by adding an explicit *deleteFlag && *squeeze && len(operands) < 2 guard.

Testing

  • go test ./interp/builtins/tr/ — 80 tests pass
  • go test ./tests/ -run TestShellScenarios — all 16 tr YAML scenarios pass
  • go test ./tests/ -run TestBuiltinImportAllowlist — passes
  • go vet ./interp/builtins/tr/ — clean
  • Formatter verified (goimports + gofmt ran, no changes needed)", "formatting_verified": true, "linting_verified": true}

Copy link
Member Author

@AlexandreYang AlexandreYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Scope: New tr builtin command — 742 lines of implementation + 707 lines of tests across 3 test files + 15 YAML scenarios + 2 lines in register_builtins.go.

Overall assessment: Needs fixes (P2 issues)

The implementation is well-structured with good security properties: no filesystem access, fixed-size read buffers, maxSetLen caps, context cancellation checks in all read loops, and proper streaming I/O. The test coverage is thorough with unit tests, GNU compatibility tests, and pentest probes.

# Priority File Finding
1 P2 Badge interp/builtins/tr/tr.go:102 Error message quoting diverges from GNU tr (%q double quotes vs single quotes)
2 P2 Badge SHELL_FEATURES.md / README.md tr not documented in project docs
3 P3 Badge tests/scenarios/cmd/tr/errors/unknown_flag.yaml Uses stderr_contains instead of expect.stderr per project convention

Positive Observations

  • Memory safety: Fixed 32 KiB read buffer, 1 MiB max set length cap, 1 MiB max repeat count — no user-controlled unbounded allocations.
  • Context cancellation: Every read loop checks ctx.Err() at each iteration.
  • Streaming I/O: Output is written in chunks, never buffered entirely in memory.
  • No filesystem access: tr only reads from stdin — no sandbox bypass vector.
  • No dangerous imports: Only context, io, strconv, and the builtins package.
  • Comprehensive pentest suite: Tests cover flag injection, large input, context timeout, repeat construct edge cases, equivalence class attacks, and pipe chaining.

AlexandreYang and others added 2 commits March 10, 2026 22:26
- Fix error message quoting: use single quotes to match GNU tr style
- Guard complementSet against panic with duplicate-heavy sets (cap could go negative)
- Fix translate passthrough when -t truncates set1 to empty (was dropping input)
- Reject tr -ds without STRING2 (match GNU tr validation)
- Fix escaped CHAR parsing in [CHAR*REPEAT] patterns (e.g. [\n*2])
- Add tr and test to SHELL_FEATURES.md and README.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use BoolVarP + MarkHidden for -C flag so only the short form is
  accepted (GNU tr rejects --COMPLEMENT)
- Defensively return "invalid repeat construct" from rptErrMsg for
  the empty countStr path instead of an empty string

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Member Author

@DataDog make a comprehensive code review

@codex make a comprehensive code review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5adf341db0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@datadog-prod-us1-3
Copy link

I pushed the changes you requested:

Summary

Comprehensive code review of the tr builtin implementation. Found and fixed 1 bug.

  • interp/builtins/tr/tr.go lines 414-428: Fixed equivalence class parser ([=c=]) to process backslash escapes. Previously, [=\n=] would be rejected with "equivalence class operand must be a single character" because the raw bytes \ and n were counted as 2 characters. Now correctly resolves the escape first (e.g. \n0x0a) before the single-character check, matching GNU tr behavior.
  • interp/builtins/tr/tr_test.go: Added TestTrEquivalenceClassBackslashEscape verifying that [=\n=] translates newlines.

Root Cause & Fix

The equivalence class parser at line 417 extracted raw bytes between [= and =] delimiters and checked len(eqChars) > 1 to reject multi-character operands. But a backslash escape like \n is 2 raw bytes that represent a single logical character (0x0a). The fix adds a check: if eqChars[0] == '\\', call parseBackslashEscapeSingle to resolve the escape before the length validation. This is consistent with how the repeat construct parser ([c*n]) already handles escapes at line 633.

Testing

  • go test ./interp/builtins/tr/ — 81 tests pass (including new TestTrEquivalenceClassBackslashEscape)
  • go test ./tests/ -run TestShellScenarios — all 16 tr YAML scenarios pass
  • go test ./tests/ -run TestBuiltinImportAllowlist — passes
  • go vet ./interp/builtins/tr/ — clean
  • Formatter ran (goimports + gofmt) — no changes needed

AlexandreYang and others added 3 commits March 10, 2026 22:51
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add stdout_contains_windows for ls -l tests since Windows uses -rw-rw-rw- permissions
- Add stderr_windows for sandbox test since /etc doesn't exist on Windows (different error message)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reject non-upper/lower character classes in STRING2 during translation,
  matching GNU tr: "when translating, the only character classes that may
  appear in string2 are 'upper' and 'lower'"
- Reject equivalence classes ([=c=]) in STRING2 during translation,
  matching GNU tr: "[=c=] expressions may not appear in string2 when
  translating"
- Allow empty STRING1 with empty STRING2 (tr '' '') as a no-op
  passthrough, matching GNU tr behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Member Author

@DataDog @codex make a comprehensive code review

@AlexandreYang
Copy link
Member Author

@claude make a comprehensive code review

@AlexandreYang AlexandreYang marked this pull request as ready for review March 10, 2026 22:14
@AlexandreYang
Copy link
Member Author

iteration 2 self-review result: APPROVE

  • Findings: 0 total (0xP1, 0xP2, 0xP3)
  • Summary: re-reviewed commit a768c30; no additional tr issues found after the repeat/complement fixes and the bash-compared regression scenarios.
  • Verification: local package tests, race run, targeted scenario tests, and RSHELL_BASH_TEST=1 all passed.
  • CI: waiting on the rerun of the previously flaky Windows workflow.

@AlexandreYang
Copy link
Member Author

Review-Fix Loop Summary

Iteration log

# Review result Findings Fixes applied CI status
1 REQUEST_CHANGES 1 (1xP1) 4 issues fixed, 6 bash-compared regression scenarios added Windows rerun required
2 APPROVE 0 Passing

Final state

  • Self-review: APPROVE
  • Unresolved external comments: 0
  • CI: Passing

Remaining issues (if any)

  • None.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a768c305ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@AlexandreYang
Copy link
Member Author

@DataDog @codex make a comprehensive code and security reviews

@datadog-datadog-prod-us1
Copy link

I can only run on private repositories.

Copy link
Member Author

@AlexandreYang AlexandreYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — tr Builtin Implementation

Overall assessment: Needs minor fixes — the implementation is well-structured, secure, and correct. One P2 finding regarding test scenario annotations that will cause bash comparison test failures.

Summary table

# Priority File Finding
1 P2 Badge tests/scenarios/cmd/tr/errors/*.yaml Error scenarios missing skip_assert_against_bash: true — will fail bash comparison tests
2 P3 Badge tests/scenarios/cmd/tr/errors/*.yaml Some error scenarios use stderr_contains where exact stderr match is preferred per AGENTS.md

Security

  • ✅ No direct filesystem access — all I/O goes through callCtx.Stdin/callCtx.Stdout
  • ✅ Fixed-size buffer reads (32 KiB) prevent memory proportional to input
  • maxSetLen (1 MiB) and maxRepeat (1 MiB) caps prevent unbounded allocation
  • ✅ Context cancellation checked in processLoop main loop
  • ✅ Only safe imports: context, errors, fmt, io, strconv
  • ✅ Byte-level 256-entry lookup tables — no injection vectors
  • expandCharClass returns defensive copies of shared slices

Bash Compatibility

  • ✅ Translation, deletion, squeezing, complement, truncate all match GNU coreutils behavior
  • ✅ Set2 padding with last character (BSD behavior) matches GNU tr
  • ✅ Octal escape handling with ambiguity warnings matches GNU tr
  • [c*n] repeat construct with octal/decimal/fill semantics correct
  • ✅ Character class definitions match POSIX C locale

Correctness

  • ✅ Edge cases well-handled: empty sets, single-char ranges, complement of large sets
  • ✅ Case class alignment validation for [:upper:]/[:lower:] pairs
  • ✅ Error paths produce informative messages with correct exit codes

Test Coverage

  • ✅ Good scenario test coverage for happy paths
  • ✅ Comprehensive Go unit tests, GNU compat tests, and pentest suite
  • ✅ Context timeout and large input tests cover resource safety

Positive Observations

  • Clean separation between set expansion, I/O loop, and mode-specific transforms
  • The processLoop + chunkTransform pattern is elegant and reusable
  • Thorough pentest suite covering flag injection, resource exhaustion, and edge cases
  • Good defensive coding (maxSetLen cap, context checks, buffer reuse)

@AlexandreYang
Copy link
Member Author

Iteration 1 self-review result: COMMENT — 2 findings (1×P2, 1×P3). P2: 8 error scenario tests missing skip_assert_against_bash: true (will fail bash comparison tests). P3: 3 error scenarios use stderr_contains instead of exact stderr match.

…at, exact stderr

- Add skip_assert_against_bash: true to 11 error scenario tests whose error
  messages intentionally differ from GNU coreutils tr
- Switch 3 error scenarios from stderr_contains to exact stderr match
- Reject negative-prefixed repeat counts (e.g. [b*-0]) in parseRepeat to
  match GNU tr behavior — strconv.ParseInt("-0") returns 0 which was
  incorrectly treated as fill mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Member Author

@DataDog @codex make a comprehensive code and security reviews

@datadog-prod-us1-6
Copy link

I can only run on private repositories.

Copy link
Member Author

@AlexandreYang AlexandreYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Iteration 2 (Self-review: APPROVE — no new findings)

Overall assessment: Safe to merge

All findings from iteration 1 have been addressed:

  1. P2 — skip_assert_against_bash: Added to all 11 error scenario files
  2. P3 — exact stderr match: Switched 3 scenarios from stderr_contains to exact stderr:
  3. P3 — negative-zero repeat: Added early - prefix check in parseRepeat (line 746)

Verification

  • All CI checks passing (8/8)
  • No new security concerns introduced
  • No regressions in test coverage
  • Implementation remains clean and well-bounded

Remaining note

One unresolved Codex thread (P2 — complement truncate with -tc) was replied to with analysis. The current behavior appears correct (mapsToSingleByte("x") = true). Left for reviewer to verify against GNU tr.

@AlexandreYang
Copy link
Member Author

Iteration 2 self-review result: APPROVE — 0 findings. All iteration 1 findings verified as fixed. No new issues or regressions introduced.

@AlexandreYang
Copy link
Member Author

Review-Fix Loop Summary

Iteration log

# Review result Findings Fixes applied CI status
1 COMMENT 2 (1×P2, 1×P3) 3 fixed (P2 skip_assert, P3 stderr exact, P3 negative-zero) Passing
2 APPROVE 0 Passing

Final state

  • Self-review: APPROVE (no findings)
  • Unresolved external comments: 1 (chatgpt-codex-connector — P2 complement truncate with -tc, replied with analysis)
  • CI: Passing (8/8 checks)

Remaining issues

  • Codex P2 — Reject truncated complemented class mappings: Replied explaining that mapsToSingleByte("x") correctly returns true for single-byte set2, so the current behavior appears correct. Left unresolved for external reviewer to verify against GNU tr 9.4.

…; add review dimension G

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Member Author

@DataDog @codex make a comprehensive code and security reviews

@datadog-prod-us1-3
Copy link

I can only run on private repositories.

Copy link
Member Author

@AlexandreYang AlexandreYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Reviewed the full tr builtin implementation (~820 lines), 3 test files (~620 lines), 25 YAML scenario files, and supporting changes (registration, SHELL_FEATURES.md).

Overall assessment: Safe to merge with minor suggestions (P3 only).

Security Assessment

No security concerns found. The implementation:

  • Uses no filesystem access — reads exclusively from callCtx.Stdin via the provided io.Reader
  • Imports only safe standard library packages: context, errors, fmt, io, strconv
  • Processes input in fixed 32 KiB chunks with a 1 MiB cap on set expansion, preventing memory exhaustion
  • Checks ctx.Err() on every chunk read to honor execution timeouts
  • Has no path handling, no command execution, no network access

Bash Compatibility Assessment

All error messages, exit codes, and behavioral outputs verified against GNU coreutils tr 9.6 on debian:bookworm-slim. The 5 scenarios with skip_assert_against_bash: true are all justified (error messages differ from bash by a trailing "Try 'tr --help' for more information." line, or the help output format differs).

Findings Summary

# Priority File Finding
1 P3 Badge interp/builtins/tr/tr.go:460 needed can go negative after min() — add a clamp for clarity
2 P3 Badge tests/scenarios/cmd/tr/ Missing scenario: delete + complement + char class (-dc '[:alpha:]')

Coverage Table

Code path Scenario test Go test Status
Basic translate translate/range_lower_to_upper.yaml tr_test.go, tr_gnu_compat_test.go Covered
Delete mode delete/basic.yaml tr_test.go Covered
Delete + complement delete/complement.yaml tr_test.go Covered
Delete + squeeze tr_test.go:161 Covered (Go only)
Squeeze only squeeze/basic.yaml tr_test.go Covered
Translate + squeeze squeeze/translate_and_squeeze.yaml tr_test.go Covered
Complement translate tr_gnu_compat_test.go:184 Covered (Go only)
Truncate mode translate/truncate.yaml tr_test.go Covered
Set1 longer (BSD pad) translate/set1_longer.yaml tr_test.go, tr_gnu_compat_test.go Covered
Character classes classes/lower_to_upper.yaml, classes/delete_digits.yaml tr_test.go Covered
Ranges translate/range_lower_to_upper.yaml tr_test.go Covered
Backslash escapes tr_test.go:215-241 Covered (Go only)
Repeat constructs translate/repeat_fill_with_suffix.yaml tr_test.go, pentest_test.go Covered
Fill repeat [c*] translate/repeat_fill_with_suffix.yaml tr_test.go:401 Covered
Empty input translate/empty_sets.yaml tr_test.go:423, tr_gnu_compat_test.go:134 Covered
All error paths (12) errors/*.yaml (12 files) tr_test.go Covered
Context cancellation tr_test.go:393 Covered (Go only)
Large input (100K) pentest_test.go:88 Covered (Go only)
Pipe chain pentest_test.go:172 Covered (Go only)
Case class alignment tr_test.go:321-367 Covered (Go only)
delete + complement + char class pentest_test.go:182 Partial (Go only, no scenario)

Overall coverage: Adequate — all meaningful code paths are tested.

skip_assert_against_bash Audit

Scenario file Bash matches? Flag needed? Status
errors/missing_operand.yaml No (bash adds "Try 'tr --help'...") Yes OK
errors/missing_second_operand.yaml No (bash adds extra lines) Yes OK
errors/delete_extra_operand.yaml No (bash adds "Try 'tr --help'...") Yes OK
errors/unknown_flag.yaml No (different flag parser output) Yes OK
hardening/help.yaml No (different help format) Yes OK

All 5 flags are justified.

Positive Observations

  • Excellent memory safety design: Byte-at-a-time processing with fixed-size buffers (32 KiB read, 1 MiB set cap) prevents any allocation proportional to input size.
  • Clean I/O pattern: The processLoop + chunkTransform abstraction is well-factored and avoids code duplication across modes.
  • Thorough error handling: All parsing paths return descriptive error messages matching GNU tr format.
  • Defensive coding: Defensive copy of charClasses (line 594), seenFill flag to prevent recursive fill, maxRepeat cap.
  • Strong test coverage: 25 YAML scenarios, comprehensive Go unit tests, GNU compat suite, and pentest suite.
  • Context-aware: Every I/O loop checks ctx.Err() for timeout support.

@AlexandreYang
Copy link
Member Author

Iteration 1 self-review result: COMMENT

  • Findings: 2 (0×P0, 0×P1, 0×P2, 2×P3)
  • P3-1: needed can go negative after min() in tr.go:460 — cosmetic clarity suggestion
  • P3-2: Missing YAML scenario for -dc with character class (only covered by Go pentest)
  • Overall: Safe to merge with minor suggestions

AlexandreYang and others added 2 commits March 12, 2026 00:50
… add complement_class scenario

- Clamp `needed` with `max(..., 0)` after `min()` to avoid negative values when
  `len(result) > maxSetLen` (cosmetic clarity, not a bug).
- Add YAML scenario for `tr -dc '[:alpha:]'` to provide bash-comparison coverage
  for the complement+delete+character-class path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace if-guard + max(min(...), 0) with a cleaner max() + min() pair,
resolving the modernize linter suggestion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86eff2ee49

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Member Author

@AlexandreYang AlexandreYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary (Iteration 2)

Reviewed the two commits added since iteration 1 (c4eafa4 and 73d7bf6), plus a full re-pass of the implementation.

Changes in this iteration

  1. tr.go line 456: Simplified needed clamping from a manual if needed < 0 { needed = 0 } to max(set1Len-(len(result)+len(tail)), 0). This is functionally equivalent and cleaner.
  2. New scenario tests/scenarios/cmd/tr/delete/complement_class.yaml: Tests tr -dc '[:alpha:]' — correctly expects only alpha characters to survive. Not skipped against bash (correct, since this matches GNU tr behavior).

Full re-review findings

No new issues found. Specifically verified:

  • Security: No filesystem access, no external command execution, no os/exec or unsafe imports. All I/O goes through CallContext stdin/stdout. Memory bounded by maxSetLen (1 MiB) and fixed 32 KiB read buffer. Context cancellation checked in the I/O loop.
  • Correctness: The min(needed, maxSetLen-len(result)) expression on line 457 is safe even when len(result) > maxSetLen (can happen briefly after class expansion) because for range negative_int produces zero iterations in Go. Recursive expandSetBytes call is bounded to depth 2 via seenFill flag.
  • Bash compatibility: 5 scenarios correctly use skip_assert_against_bash: true (help text, error messages that differ from GNU tr). The remaining 8 error scenarios and all functional scenarios are validated against bash.
  • Test coverage: Adequate — 26 YAML scenarios, comprehensive Go unit tests, GNU compat tests, and pentest coverage.

Positive observations

  • Clean byte-at-a-time architecture with 256-entry lookup tables — no allocation proportional to input size
  • Good separation of concerns: processLoop handles I/O, mode-specific functions provide transform closures
  • Defensive maxSetLen cap on set expansion prevents memory abuse
  • seenFill flag prevents unbounded recursion in fill-repeat parsing
  • Context cancellation is properly checked in the I/O loop

Overall assessment: safe to merge. The iteration 2 changes are minimal, correct, and introduce no new issues.

…t -tc with complement class

- Allow explicit repeat counts [c*n] in STRING1 (GNU tr accepts these,
  only fill repeats [c*] are STRING2-only)
- Allow explicit repeat counts in STRING2 for delete+squeeze mode (GNU
  tr expands [c*n] in the squeeze set)
- Reject -tc with complemented character classes (GNU tr rejects
  truncation with complement+class regardless of STRING2 content)
- Add 3 scenario tests with bash comparison coverage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Member Author

Iteration 2 self-review result: COMMENT (no new issues)

  • Self-review: No new findings on the iteration 1 fixes
  • External review (@codex) comments addressed: 3 fixes applied (2×P1, 1×P2)
    • P1: Allow [CHAR*REPEAT] in STRING1 (was incorrectly rejected)
    • P1: Allow repeat patterns in STRING2 for delete+squeeze mode
    • P2: Reject -tc with complemented character classes
  • 3 new test scenarios added
  • All tests passing

@chatgpt-codex-connector
Copy link

Summary

  • I reviewed the trigger context (Iteration 2 self-review result: COMMENT (no new issues)) and did not make any code changes, since it reports all previously raised issues as already addressed and no new findings to fix.
  • Because there were no repository changes, I did not create a commit and did not call the PR creation tool (to avoid an invalid commit/PR state).

Testing

  • go test ./interp/builtins/tr/...
  • go test ./tests/ -run TestShellScenarios/cmd/tr

View task →

Copy link
Member Author

@AlexandreYang AlexandreYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary — Iteration 3

Reviewed the full tr builtin implementation with focus on commit f520198 (allow [c*n] in STRING1, repeats in -ds STRING2, reject -tc with complemented char class).

Overall Assessment: Safe to merge (minor comment-only finding)

All three behavioral changes in f520198 have been verified against GNU bash + coreutils (debian:bookworm-slim) and produce byte-for-byte identical output:

  • echo abc | tr '[a*3]' xxbc
  • echo aaab | tr -d -s a '[x*3]'b
  • echo test | tr -tc '[:lower:]' x → error, exit 1 ✓

The parseRepeat / expandSetBytes guard logic correctly distinguishes fill repeats ([c*], [c*0]) from explicit-count repeats ([c*3]), and the new -tc rejection matches GNU tr's behavior where truncation makes full-domain complement mapping impossible.

Security

  • No filesystem access — reads stdin, writes stdout only ✓
  • No unsafe imports (context, errors, fmt, io, strconv + builtins) ✓
  • Memory bounded by maxSetLen (1MB) and maxRepeat (1MB) ✓
  • I/O bounded by 32KiB read buffer with context cancellation ✓
  • No sandbox bypass vectors identified ✓

Bash Compatibility

All 23 non-skipped scenario tests match GNU bash output. The 5 skip_assert_against_bash: true scenarios are all justified (differ in "Try 'tr --help'" suffix or custom help text).

Findings

# Priority File Finding
1 P3 Badge interp/builtins/tr/tr.go:167 Misindented comment (extra tab on first line)

Test Coverage

Code path Scenario test Go test Status
[c*n] explicit repeat in STRING1 repeat_explicit_in_string1.yaml Covered
[c*n] explicit repeat in -ds STRING2 repeat_in_ds_mode.yaml Covered
Reject -tc with complemented char class truncate_complemented_char_class.yaml Covered
[c*] fill rejected in STRING1 repeat_in_string1.yaml TestTrPentestRepeatFillEmptyCount Covered
[c*] fill rejected in non-translate STRING2 repeat_in_string2_non_translate.yaml Covered
Complement+class with multi-byte S2 complemented_char_class_requires_single_mapping.yaml Covered
[c*] fill in translate mode repeat_fill_with_suffix.yaml TestTrRepeatZeroInSet2 Covered

Coverage: Adequate — all new code paths from f520198 are covered by scenario tests validated against bash.

Positive Observations

  • Clean separation of fill vs explicit repeat logic makes the code easy to reason about
  • Good defensive comment placement explaining GNU-specific behavior
  • All new scenarios are validated against bash (no unnecessary skip_assert_against_bash)
  • The maxSetLen / maxRepeat caps and context-aware I/O loop are solid DoS protections

@AlexandreYang
Copy link
Member Author

Iteration 3 self-review result: COMMENT (1×P3)

  • Self-review: 1 finding — P3 misindented comment on line 167 (cosmetic)
  • All 3 behavioral changes from iteration 2 verified against GNU bash
  • External review comments: all 5 threads resolved
  • CI: all checks passing
  • Overall: safe to merge

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Member Author

Review-Fix Loop Summary

Iteration log

# Review result Findings Fixes applied CI status
1 COMMENT 2 (2×P3) + 3 external (2×P1, 1×P2) 2 fixed (P3s) Passing
2 COMMENT (no new issues) 0 self + 3 external addressed 3 fixed (P1: allow [c*n] in STRING1, P1: allow repeats in -ds STRING2, P2: reject -tc with complement class) Passing
3 COMMENT 1 (1×P3 misindented comment) 1 fixed Passing

Final state

  • Self-review: COMMENT (no actionable findings remaining)
  • Unresolved external comments: 0
  • CI: Passing (all 8 checks)

Remaining issues

None. All findings have been addressed and verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants