Skip to content

Conversation

@khasinski
Copy link

Summary

This PR fixes a panic that occurs when the Ruby or Vue preprocessors encounter files with invalid UTF-8 bytes.

The issue:

  • ruby.rs:37 and vue.rs:18 used std::str::from_utf8(content).unwrap()
  • This panics when processing files containing invalid UTF-8 bytes

Error message:

thread panicked at crates/oxide/src/extractor/pre_processors/ruby.rs:37:59:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 45, error_len: Some(1) }

The fix:

  • Wrap UTF-8 conversion in if let Ok(...) to gracefully handle invalid UTF-8
  • Skip regex-based template extraction when UTF-8 conversion fails
  • Allow byte-level processing to continue (in Ruby's case)

This can happen in Rails projects when:

  • Binary files are inadvertently scanned
  • Files contain non-UTF-8 encodings
  • Files are truncated at multi-byte character boundaries during parallel processing

Test plan

  • Added test_invalid_utf8_does_not_panic test for Ruby preprocessor
  • Added test_valid_utf8_with_multibyte_chars test for Ruby preprocessor
  • Added test_invalid_utf8_does_not_panic test for Vue preprocessor
  • All existing tests pass (cargo test pre_processors - 43 tests)

The Ruby and Vue preprocessors were using `from_utf8().unwrap()` which
panics when processing files containing invalid UTF-8 bytes. This can
happen when:
- Binary files are inadvertently scanned
- Files are truncated at multi-byte character boundaries
- Files use non-UTF-8 encodings

This change wraps the UTF-8 conversion in `if let Ok(...)` to gracefully
skip the regex-based template extraction when UTF-8 conversion fails,
while still allowing the byte-level processing to continue (in Ruby's
case).

Fixes panic: `thread panicked at crates/oxide/src/extractor/pre_processors/ruby.rs:37:59`
@khasinski khasinski requested a review from a team as a code owner January 21, 2026 22:52
@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

Walkthrough

The changes introduce UTF-8 validation checks in two pre-processor modules. In the Ruby processor, HEREDOC extraction logic is now conditional on UTF-8 validity; invalid UTF-8 skips this extraction while byte-level processing continues. In the Vue processor, template processing similarly gates execution on UTF-8 validation. Both modifications include tests that verify invalid UTF-8 handling and valid character processing. No public API signatures were altered.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing invalid UTF-8 handling in Ruby and Vue preprocessors, which matches the changeset content.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the UTF-8 panic issue and the fixes applied to both Ruby and Vue preprocessors.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant