Skip to content

Conversation

@mxsm
Copy link
Owner

@mxsm mxsm commented Jan 6, 2026

fix #94

Summary by CodeRabbit

  • New Features

    • Added an optional "simd" feature to enable SIMD-accelerated string operations (x86_64 SSE2) for faster equality, prefix/suffix, contains, and find on longer inputs.
  • Benchmarks

    • Added a SIMD-focused benchmark target to measure performance across string sizes.
  • Examples

    • Added a SIMD demo illustrating usage patterns and performance notes.
  • Tests

    • Added comprehensive SIMD-gated tests covering correctness, thresholds, Unicode, and alignment cases.
  • Documentation

    • Updated docs, quick start, and performance guidance to describe SIMD usage and examples.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 6, 2026 03:50
@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

Adds an optional "simd" feature that provides x86_64 SSE2-accelerated implementations for equality, starts_with, ends_with, contains, and find in CheetahString, plus tests, benchmarks, examples, and documentation; scalar fallbacks remain when SIMD is unavailable or feature-disabled.

Changes

Cohort / File(s) Summary
Feature & Manifest
Cargo.toml
Added simd feature and a [[bench]] target simd (harness = false).
SIMD Module
src/simd.rs
New crate-visible SIMD implementation (eq_bytes, starts_with_bytes, ends_with_bytes, find_bytes) with SIMD_THRESHOLD = 16, SSE2 paths, scalar fallbacks, feature/arch gating, and unit tests.
Integration into API
src/cheetah_string.rs
Conditional #[cfg(feature = "simd")] branches to call crate::simd byte APIs in PartialEq, starts_with, ends_with, contains, and find; preserves existing signatures and scalar behavior when disabled.
Crate & Docs
src/lib.rs, README.md
Conditional module inclusion for simd on x86_64; crate-level docs and README updated with SIMD feature description, usage, and benchmark instructions.
Benchmarks & Examples
benches/simd.rs, examples/simd_demo.rs
New Criterion benchmarks across multiple sizes; example demonstrating SIMD-accelerated operations and usage notes.
Tests
tests/simd.rs
New SIMD-gated tests covering equality, prefix/suffix, contains, find, Unicode, alignment/offset cases, and threshold boundary behaviors.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User as User Code
  participant API as CheetahString API
  participant SIMD as crate::simd
  participant Scalar as Scalar impl (std)

  User->>API: call starts_with/contains/find/eq
  API->>API: check feature flag & target (compile-time)
  alt simd feature & x86_64 at runtime SSE2 available
    API->>SIMD: call starts_with_bytes / find_bytes / eq_bytes
    SIMD->>SIMD: SSE2-accelerated routine (unsafe) / thresholds
    SIMD-->>API: boolean / Option<usize>
  else feature disabled or SSE2 missing or below threshold
    API->>Scalar: call std string methods (starts_with/find/etc.)
    Scalar-->>API: boolean / Option<usize>
  end
  API-->>User: return result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I sniffed through bytes both near and far,
SSE2 neon in a tiny jar,
Starts and finds now sprint and leap,
While fallbacks hum when strings are cheap,
Hooray — CheetahString and I hop on par!

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly indicates SIMD acceleration implementation for CheetahString string operations, which matches the main change across the changeset.
Linked Issues check ✅ Passed The pull request implements SIMD acceleration for string operations (#94) with new SIMD module, accelerated methods, benchmarks, tests, and documentation.
Out of Scope Changes check ✅ Passed All changes directly relate to SIMD acceleration: new simd.rs module, conditional compilation, updated string methods, benchmarks, examples, tests, and documentation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @src/lib.rs:
- Around line 7-25: Doc example version is out of date: update the TOML
dependency example inside the crate doc block (the SIMD Acceleration doc comment
in src/lib.rs) to use the actual package version "1.0.0" instead of "0.1" so the
example matches Cargo.toml; edit the commented snippet line starting with
`cheetah-string = { version = ...` to `"1.0.0"`.

In @src/simd.rs:
- Around line 7-8: The import `use core::arch::x86_64::*;` is only gated by
`feature = "simd"` and breaks compilation on non-x86_64 targets; change the cfg
to require both the simd feature and x86_64 target (e.g. `#[cfg(all(feature =
"simd", target_arch = "x86_64"))] use core::arch::x86_64::*;`), gate all
x86_64/SSE2-specific implementations (functions like `eq_bytes`,
`starts_with_bytes`, any SSE helpers) with the same `#[cfg(all(feature = "simd",
target_arch = "x86_64"))]`, and add non-x86_64 fallback implementations for
those same functions using `#[cfg(not(all(feature = "simd", target_arch =
"x86_64")))]` that implement the plain scalar behavior (e.g. `a == b`,
`haystack.starts_with(needle)`) so non-x86_64 targets compile correctly.
🧹 Nitpick comments (3)
src/simd.rs (2)

87-92: Document threshold logic inconsistency.

The find_bytes function requires both needle.len() >= SIMD_THRESHOLD and haystack.len() >= SIMD_THRESHOLD (lines 90-91), whereas starts_with_bytes and ends_with_bytes only check the needle length (lines 44, 66). This asymmetry may be intentional for performance reasons, but it should be documented to explain the rationale.


159-176: Consider optimizing repeated SIMD comparisons in the matching loop.

The inner loop calls eq_bytes_sse2 (line 166) for every candidate position. For patterns with a frequently-occurring first byte, this results in many full SIMD comparisons. Consider verifying a few more bytes before the full comparison or implementing a more sophisticated filter (e.g., checking first and last bytes).

However, this optimization can be deferred as the current implementation is correct and the improvement may be marginal for typical workloads.

src/cheetah_string.rs (1)

999-1025: Consider adding SIMD to remaining PartialEq implementations for consistency.

The implementations at lines 999-1025 don't use SIMD, creating asymmetric performance:

  • cheetah == "str" → uses SIMD (line 966)
  • "str" == cheetah → no SIMD (line 1009)

For consistency, the reverse implementations could delegate to the SIMD-enabled ones:

🔎 Proposed fix for symmetric SIMD usage
 impl PartialEq<CheetahString> for str {
     #[inline]
     fn eq(&self, other: &CheetahString) -> bool {
-        self == other.as_str()
+        other.eq(self)
     }
 }
 
 impl PartialEq<CheetahString> for String {
     #[inline]
     fn eq(&self, other: &CheetahString) -> bool {
-        self.as_str() == other.as_str()
+        other.eq(self)
     }
 }
 
 impl PartialEq<CheetahString> for &str {
     #[inline]
     fn eq(&self, other: &CheetahString) -> bool {
-        *self == other.as_str()
+        other.eq(*self)
     }
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b15105 and 3fdcc26.

📒 Files selected for processing (8)
  • Cargo.toml
  • README.md
  • benches/simd.rs
  • examples/simd_demo.rs
  • src/cheetah_string.rs
  • src/lib.rs
  • src/simd.rs
  • tests/simd.rs
🧰 Additional context used
🧬 Code graph analysis (5)
examples/simd_demo.rs (1)
src/cheetah_string.rs (15)
  • from (30-32)
  • from (37-39)
  • from (44-46)
  • from (58-62)
  • from (82-86)
  • from (91-96)
  • from (101-106)
  • from (120-122)
  • from (163-165)
  • from (170-172)
  • from (177-208)
  • from (243-245)
  • starts_with (471-485)
  • ends_with (519-533)
  • contains (567-581)
benches/simd.rs (1)
src/cheetah_string.rs (12)
  • from (30-32)
  • from (37-39)
  • from (44-46)
  • from (58-62)
  • from (82-86)
  • from (91-96)
  • from (101-106)
  • from (120-122)
  • from (163-165)
  • from (170-172)
  • from (177-208)
  • from (243-245)
src/cheetah_string.rs (1)
src/simd.rs (4)
  • starts_with_bytes (33-51)
  • ends_with_bytes (55-74)
  • find_bytes (78-101)
  • eq_bytes (15-29)
src/simd.rs (1)
src/cheetah_string.rs (2)
  • len (428-438)
  • cmp (1038-1040)
tests/simd.rs (2)
src/serde.rs (1)
  • cheetah_string (27-89)
src/cheetah_string.rs (16)
  • from (30-32)
  • from (37-39)
  • from (44-46)
  • from (58-62)
  • from (82-86)
  • from (91-96)
  • from (101-106)
  • from (120-122)
  • from (163-165)
  • from (170-172)
  • from (177-208)
  • from (243-245)
  • starts_with (471-485)
  • ends_with (519-533)
  • contains (567-581)
  • find (614-624)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (22)
Cargo.toml (2)

26-26: LGTM - Clean feature flag addition.

The simd feature flag follows Rust conventions for conditional compilation. The empty feature definition is appropriate since SIMD support is controlled via #[cfg(feature = "simd")] gates in the code.


36-38: LGTM - Benchmark configuration is correct.

The new SIMD benchmark target is properly configured with harness = false for Criterion compatibility.

src/lib.rs (1)

65-66: LGTM - Correct conditional compilation pattern.

The SIMD module is properly feature-gated and remains internal to the crate. This maintains API stability while enabling optional SIMD acceleration.

README.md (2)

34-34: Good platform-specific documentation.

The README correctly specifies that SIMD acceleration is for x86_64 platforms with SSE2 support. This sets appropriate expectations for users on other architectures.

Also applies to: 62-62, 108-108


79-83: No action needed. The documentation is accurate and the implementation correctly supports char patterns.

The code verification shows that starts_with() and contains() properly handle both char and &str patterns through the StrPattern trait. Character patterns are routed to the standard library implementation (which is appropriate since they don't benefit from SIMD acceleration), while string patterns use SIMD optimization when the feature is enabled. The README examples at lines 79-83 are correct.

src/simd.rs (3)

108-136: LGTM - Correct SSE2 equality implementation.

The SIMD equality check properly:

  • Uses unaligned loads (_mm_loadu_si128) to handle arbitrary byte slices
  • Processes 16-byte chunks with SSE2 compare and movemask
  • Handles remaining bytes with scalar comparison
  • Uses 0xFFFF mask check for all-bytes-equal verification

184-211: LGTM - Efficient single-byte search with SSE2.

The implementation correctly:

  • Broadcasts the needle byte to all lanes
  • Uses _mm_movemask_epi8 with trailing_zeros to find the first match
  • Handles remainder bytes with standard iterator

213-278: Good test coverage of SIMD functionality in the unit tests.

The tests verify:

  • Basic correctness of each operation
  • Threshold-based behavior (small vs. large inputs)
  • Edge cases like empty patterns

The direct call to find_byte_sse2 in the test is acceptable since it's in test code and properly feature-gated.

Comprehensive edge case coverage for alignment, Unicode, and boundary conditions already exists in tests/simd.rs, including:

  • Alignment scenarios with offsets 0-15 bytes
  • Unicode handling with multi-byte characters and emoji
  • Boundary conditions at exactly 16 bytes (SIMD threshold)
  • Pattern matching at string ends and inline storage limits
benches/simd.rs (3)

4-24: LGTM - Well-structured equality benchmarks.

The benchmarks properly test both matching and non-matching cases across a good range of sizes, including the critical SIMD threshold boundary at 16 bytes. The use of black_box prevents the compiler from optimizing away the comparisons.


116-153: Excellent realistic workload scenarios.

The realistic benchmarks provide valuable performance data for common use cases:

  • URL parsing with protocol and path checking
  • Log filtering with timestamp and level detection
  • Content-type validation

These scenarios align well with the PR objectives to accelerate string operations in real-world applications.


155-164: Complete benchmark coverage.

The benchmark suite comprehensively covers all SIMD-accelerated operations: equality, prefix/suffix matching, substring search, and realistic mixed workloads. This will provide good performance tracking for the SIMD implementation.

examples/simd_demo.rs (1)

1-102: Well-structured example demonstrating SIMD capabilities.

The example effectively showcases all SIMD-accelerated operations with clear sections and practical use cases. The code is idiomatic and serves as good documentation for users.

One minor note: the comment on line 100-101 mentions ">= 16 bytes" which correctly matches the SIMD_THRESHOLD constant in src/simd.rs.

tests/simd.rs (5)

1-4: LGTM - Good feature gating for SIMD tests.

The #![cfg(feature = "simd")] attribute correctly ensures these tests only run when the SIMD feature is enabled, preventing compilation errors on platforms without SIMD support.


5-28: Comprehensive equality tests covering key scenarios.

The test correctly verifies equality for:

  • Short strings (below SIMD threshold)
  • Long strings (above threshold)
  • Cross-type comparisons with str and String

30-120: Good coverage of pattern matching operations.

The tests for starts_with, ends_with, contains, and find are thorough, covering:

  • Short and long patterns
  • Edge cases with empty strings and empty patterns
  • Correct position indices for find

122-136: Critical Unicode test for byte-level SIMD operations.

This test is essential because SIMD operates on raw bytes. The test correctly verifies:

  • Multi-byte UTF-8 characters (世界 = 6 bytes, 🌍 = 4 bytes) work properly
  • find("世界") returns byte position 6, not character position

138-202: Excellent edge case coverage.

The remaining tests cover critical scenarios:

  • Alignment tests (lines 139-150): Test all 16 alignment offsets - important for SIMD correctness
  • Boundary conditions (lines 153-167): Test around the 16-byte threshold
  • Pattern at end (lines 170-180): Ensure patterns at string boundaries are found
  • Inline storage (lines 192-202): Verify SIMD works with inline-stored strings (≤23 bytes)
src/cheetah_string.rs (5)

457-485: Well-implemented SIMD integration for starts_with.

The conditional compilation is correct:

  • Uses SIMD path when feature is enabled
  • Falls back to standard str::starts_with otherwise
  • Char patterns correctly bypass SIMD (single char comparison doesn't benefit from SIMD)

Documentation is updated appropriately.


503-533: Consistent SIMD integration for ends_with.

The implementation mirrors starts_with appropriately, with SIMD acceleration for string patterns and standard comparison for single characters.


551-581: SIMD contains uses find_bytes efficiently.

Using find_bytes(...).is_some() is the correct approach since the SIMD module doesn't have a dedicated contains_bytes function. This reuses the existing search logic.


599-624: SIMD integration for find is straightforward.

The find method correctly uses AsRef<str> and integrates with simd::find_bytes. Note that unlike starts_with/ends_with/contains, this doesn't have a separate char pattern path - which is fine since single-character searches don't benefit significantly from SIMD for this use case.


950-990: SIMD-enabled PartialEq implementations look correct.

The SIMD path for equality comparisons is well-integrated with proper feature gating.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements SIMD (Single Instruction, Multiple Data) acceleration for common string operations in CheetahString using x86_64 SSE2 instructions. The implementation provides runtime feature detection and automatically falls back to scalar operations when SIMD is unavailable or for inputs below the 16-byte threshold.

Key changes:

  • New simd feature flag and module with SSE2-accelerated implementations of string comparison and search operations
  • Integration of SIMD acceleration into starts_with(), ends_with(), contains(), find(), and equality comparison methods
  • Comprehensive test suite and benchmarks for SIMD functionality

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/simd.rs New module implementing SSE2-accelerated string operations with automatic fallback to scalar code
src/cheetah_string.rs Integrates SIMD functions into string methods with conditional compilation for the simd feature
src/lib.rs Adds simd module declaration and documentation for SIMD acceleration feature
tests/simd.rs Comprehensive test coverage for SIMD-accelerated operations including edge cases and Unicode handling
benches/simd.rs Performance benchmarks for SIMD operations across various string sizes and realistic workloads
examples/simd_demo.rs Demonstration of SIMD-accelerated operations with real-world use cases
Cargo.toml Adds simd feature flag and simd benchmark configuration
README.md Updates documentation with SIMD feature information and version bump to 1.0.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/simd.rs (1)

256-264: Consider adding boundary and edge-case tests.

The test coverage is reasonable for initial implementation. For robustness, consider adding tests for:

  • Strings exactly at the 16-byte SIMD threshold
  • Matches/differences at SIMD chunk boundaries (positions 15, 16, 17)
  • Large inputs to ensure the scalar fallback for remaining bytes works correctly
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3fdcc26 and ad1e9fa.

📒 Files selected for processing (2)
  • src/lib.rs
  • src/simd.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib.rs
🔇 Additional comments (4)
src/simd.rs (4)

7-8: Import architecture gate looks correct.

The x86_64 intrinsics import is now properly gated with both the feature flag and architecture check. This addresses the previous compilation failure concern for non-x86_64 platforms.


108-136: LGTM: SSE2 equality comparison implementation is correct.

The implementation correctly:

  • Processes 16 bytes per iteration using unaligned loads
  • Uses _mm_cmpeq_epi8 for byte-wise comparison and checks mask == 0xFFFF for all-equal
  • Falls back to scalar for remaining bytes

141-179: Correct implementation; consider future optimization.

The hybrid two-phase approach (find first byte candidates, then verify full match) is functionally correct. For heavily repeated first bytes, there may be room for optimization using algorithms like SIMD-accelerated Two-Way search, but this is a reasonable baseline implementation.


184-211: LGTM: Single-byte search implementation is correct.

Standard and efficient SSE2 pattern: broadcast needle, compare 16 bytes at a time, use trailing_zeros on mask to find first match position.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/simd.rs (1)

74-93: Consider adding test for pattern longer than haystack.

The edge cases cover empty strings well, but you might want to add an explicit test where the pattern is longer than a non-empty haystack to verify the SIMD implementation handles this correctly:

🔎 Optional additional test case
     // Edge cases
     assert!(s.contains(""));
     let empty = CheetahString::from("");
     assert!(empty.contains(""));
     assert!(!empty.contains("a"));
+
+    // Pattern longer than haystack
+    let short = CheetahString::from("abc");
+    assert!(!short.contains("abcdefghijklmnop"));
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad1e9fa and 0d6167b.

📒 Files selected for processing (1)
  • tests/simd.rs
🧰 Additional context used
🧬 Code graph analysis (1)
tests/simd.rs (1)
src/cheetah_string.rs (4)
  • starts_with (471-485)
  • ends_with (519-533)
  • contains (567-581)
  • find (614-624)
🔇 Additional comments (7)
tests/simd.rs (7)

5-28: LGTM!

Good test coverage for equality operations across SIMD threshold boundaries and with different string types (&str, String).


30-93: LGTM!

Comprehensive coverage for starts_with, ends_with, and contains operations including short/long patterns and empty string edge cases.


95-120: LGTM!

The find tests correctly verify expected positions and cover important edge cases including empty patterns and first-character matches.


122-136: LGTM!

Critical coverage for Unicode handling. Since SIMD operates at byte level, these tests validate that multi-byte UTF-8 sequences (Chinese characters, emoji) are handled correctly.


138-150: LGTM!

Good coverage for memory alignment scenarios. Testing offsets 0-15 ensures the SIMD implementation handles both aligned and unaligned memory accesses correctly.


152-167: LGTM!

Testing at exactly the SIMD threshold (16 bytes) and just below/above is important for validating the transition between scalar and SIMD code paths.


169-202: LGTM!

Good coverage for edge cases: patterns at string boundaries, first-occurrence semantics for find, and inline storage optimization (≤23 bytes). These tests help ensure SIMD operations work correctly with different internal string representations.

@mxsm mxsm merged commit 8da5f8c into main Jan 6, 2026
6 of 7 checks passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/simd.rs (1)

87-95: Consider relaxing SIMD threshold for single-byte needles.

The current condition requires needle.len() >= SIMD_THRESHOLD (16 bytes), but find_bytes_sse2 has optimized handling for single-byte needles (line 150-152). A single-byte search in a large haystack could benefit from SIMD acceleration.

Consider changing the condition to:

if is_x86_feature_detected!("sse2")
    && haystack.len() >= SIMD_THRESHOLD
    && (needle.len() == 1 || needle.len() >= SIMD_THRESHOLD)

This would enable SIMD for single-byte searches in large haystacks while still avoiding overhead for small multi-byte needles.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d6167b and 2158c67.

📒 Files selected for processing (1)
  • src/simd.rs
🔇 Additional comments (7)
src/simd.rs (7)

1-11: Architecture gating properly fixed.

The import and threshold configuration are now correct with proper x86_64 architecture gating.


13-29: LGTM.

The equality check correctly handles length mismatch, dispatches to SIMD when beneficial, and falls back to standard comparison.


31-51: LGTM.

The starts_with implementation correctly handles edge cases (empty needle, length checks) and appropriately dispatches to SIMD.


53-74: LGTM.

The ends_with implementation correctly calculates the suffix offset and handles edge cases appropriately.


105-136: LGTM.

The SSE2 equality implementation is correct: unaligned loads, proper byte-wise comparison, early-exit on mismatch, and scalar handling of remaining bytes.


138-179: LGTM.

The substring search correctly uses a hybrid approach: SIMD-accelerated first-byte search followed by full verification, with proper bounds checking and position advancement.


181-211: LGTM.

The single-byte search correctly broadcasts the needle and uses bit manipulation to find the first match position.

@mxsm mxsm deleted the feat-94 branch January 6, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement SIMD acceleration for string operations in CheetahString

3 participants