-
Notifications
You must be signed in to change notification settings - Fork 6
[ISSUE #94]🚀Implement SIMD acceleration for string operations in CheetahString #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds an optional "simd" feature that provides x86_64 SSE2-accelerated implementations for equality, starts_with, ends_with, contains, and find in CheetahString, plus tests, benchmarks, examples, and documentation; scalar fallbacks remain when SIMD is unavailable or feature-disabled. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as User Code
participant API as CheetahString API
participant SIMD as crate::simd
participant Scalar as Scalar impl (std)
User->>API: call starts_with/contains/find/eq
API->>API: check feature flag & target (compile-time)
alt simd feature & x86_64 at runtime SSE2 available
API->>SIMD: call starts_with_bytes / find_bytes / eq_bytes
SIMD->>SIMD: SSE2-accelerated routine (unsafe) / thresholds
SIMD-->>API: boolean / Option<usize>
else feature disabled or SSE2 missing or below threshold
API->>Scalar: call std string methods (starts_with/find/etc.)
Scalar-->>API: boolean / Option<usize>
end
API-->>User: return result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @src/lib.rs:
- Around line 7-25: Doc example version is out of date: update the TOML
dependency example inside the crate doc block (the SIMD Acceleration doc comment
in src/lib.rs) to use the actual package version "1.0.0" instead of "0.1" so the
example matches Cargo.toml; edit the commented snippet line starting with
`cheetah-string = { version = ...` to `"1.0.0"`.
In @src/simd.rs:
- Around line 7-8: The import `use core::arch::x86_64::*;` is only gated by
`feature = "simd"` and breaks compilation on non-x86_64 targets; change the cfg
to require both the simd feature and x86_64 target (e.g. `#[cfg(all(feature =
"simd", target_arch = "x86_64"))] use core::arch::x86_64::*;`), gate all
x86_64/SSE2-specific implementations (functions like `eq_bytes`,
`starts_with_bytes`, any SSE helpers) with the same `#[cfg(all(feature = "simd",
target_arch = "x86_64"))]`, and add non-x86_64 fallback implementations for
those same functions using `#[cfg(not(all(feature = "simd", target_arch =
"x86_64")))]` that implement the plain scalar behavior (e.g. `a == b`,
`haystack.starts_with(needle)`) so non-x86_64 targets compile correctly.
🧹 Nitpick comments (3)
src/simd.rs (2)
87-92: Document threshold logic inconsistency.The
find_bytesfunction requires bothneedle.len() >= SIMD_THRESHOLDandhaystack.len() >= SIMD_THRESHOLD(lines 90-91), whereasstarts_with_bytesandends_with_bytesonly check the needle length (lines 44, 66). This asymmetry may be intentional for performance reasons, but it should be documented to explain the rationale.
159-176: Consider optimizing repeated SIMD comparisons in the matching loop.The inner loop calls
eq_bytes_sse2(line 166) for every candidate position. For patterns with a frequently-occurring first byte, this results in many full SIMD comparisons. Consider verifying a few more bytes before the full comparison or implementing a more sophisticated filter (e.g., checking first and last bytes).However, this optimization can be deferred as the current implementation is correct and the improvement may be marginal for typical workloads.
src/cheetah_string.rs (1)
999-1025: Consider adding SIMD to remainingPartialEqimplementations for consistency.The implementations at lines 999-1025 don't use SIMD, creating asymmetric performance:
cheetah == "str"→ uses SIMD (line 966)"str" == cheetah→ no SIMD (line 1009)For consistency, the reverse implementations could delegate to the SIMD-enabled ones:
🔎 Proposed fix for symmetric SIMD usage
impl PartialEq<CheetahString> for str { #[inline] fn eq(&self, other: &CheetahString) -> bool { - self == other.as_str() + other.eq(self) } } impl PartialEq<CheetahString> for String { #[inline] fn eq(&self, other: &CheetahString) -> bool { - self.as_str() == other.as_str() + other.eq(self) } } impl PartialEq<CheetahString> for &str { #[inline] fn eq(&self, other: &CheetahString) -> bool { - *self == other.as_str() + other.eq(*self) } }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
Cargo.tomlREADME.mdbenches/simd.rsexamples/simd_demo.rssrc/cheetah_string.rssrc/lib.rssrc/simd.rstests/simd.rs
🧰 Additional context used
🧬 Code graph analysis (5)
examples/simd_demo.rs (1)
src/cheetah_string.rs (15)
from(30-32)from(37-39)from(44-46)from(58-62)from(82-86)from(91-96)from(101-106)from(120-122)from(163-165)from(170-172)from(177-208)from(243-245)starts_with(471-485)ends_with(519-533)contains(567-581)
benches/simd.rs (1)
src/cheetah_string.rs (12)
from(30-32)from(37-39)from(44-46)from(58-62)from(82-86)from(91-96)from(101-106)from(120-122)from(163-165)from(170-172)from(177-208)from(243-245)
src/cheetah_string.rs (1)
src/simd.rs (4)
starts_with_bytes(33-51)ends_with_bytes(55-74)find_bytes(78-101)eq_bytes(15-29)
src/simd.rs (1)
src/cheetah_string.rs (2)
len(428-438)cmp(1038-1040)
tests/simd.rs (2)
src/serde.rs (1)
cheetah_string(27-89)src/cheetah_string.rs (16)
from(30-32)from(37-39)from(44-46)from(58-62)from(82-86)from(91-96)from(101-106)from(120-122)from(163-165)from(170-172)from(177-208)from(243-245)starts_with(471-485)ends_with(519-533)contains(567-581)find(614-624)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (22)
Cargo.toml (2)
26-26: LGTM - Clean feature flag addition.The
simdfeature flag follows Rust conventions for conditional compilation. The empty feature definition is appropriate since SIMD support is controlled via#[cfg(feature = "simd")]gates in the code.
36-38: LGTM - Benchmark configuration is correct.The new SIMD benchmark target is properly configured with
harness = falsefor Criterion compatibility.src/lib.rs (1)
65-66: LGTM - Correct conditional compilation pattern.The SIMD module is properly feature-gated and remains internal to the crate. This maintains API stability while enabling optional SIMD acceleration.
README.md (2)
34-34: Good platform-specific documentation.The README correctly specifies that SIMD acceleration is for x86_64 platforms with SSE2 support. This sets appropriate expectations for users on other architectures.
Also applies to: 62-62, 108-108
79-83: No action needed. The documentation is accurate and the implementation correctly supports char patterns.The code verification shows that
starts_with()andcontains()properly handle bothcharand&strpatterns through theStrPatterntrait. Character patterns are routed to the standard library implementation (which is appropriate since they don't benefit from SIMD acceleration), while string patterns use SIMD optimization when the feature is enabled. The README examples at lines 79-83 are correct.src/simd.rs (3)
108-136: LGTM - Correct SSE2 equality implementation.The SIMD equality check properly:
- Uses unaligned loads (
_mm_loadu_si128) to handle arbitrary byte slices- Processes 16-byte chunks with SSE2 compare and movemask
- Handles remaining bytes with scalar comparison
- Uses
0xFFFFmask check for all-bytes-equal verification
184-211: LGTM - Efficient single-byte search with SSE2.The implementation correctly:
- Broadcasts the needle byte to all lanes
- Uses
_mm_movemask_epi8withtrailing_zerosto find the first match- Handles remainder bytes with standard iterator
213-278: Good test coverage of SIMD functionality in the unit tests.The tests verify:
- Basic correctness of each operation
- Threshold-based behavior (small vs. large inputs)
- Edge cases like empty patterns
The direct call to
find_byte_sse2in the test is acceptable since it's in test code and properly feature-gated.Comprehensive edge case coverage for alignment, Unicode, and boundary conditions already exists in
tests/simd.rs, including:
- Alignment scenarios with offsets 0-15 bytes
- Unicode handling with multi-byte characters and emoji
- Boundary conditions at exactly 16 bytes (SIMD threshold)
- Pattern matching at string ends and inline storage limits
benches/simd.rs (3)
4-24: LGTM - Well-structured equality benchmarks.The benchmarks properly test both matching and non-matching cases across a good range of sizes, including the critical SIMD threshold boundary at 16 bytes. The use of
black_boxprevents the compiler from optimizing away the comparisons.
116-153: Excellent realistic workload scenarios.The realistic benchmarks provide valuable performance data for common use cases:
- URL parsing with protocol and path checking
- Log filtering with timestamp and level detection
- Content-type validation
These scenarios align well with the PR objectives to accelerate string operations in real-world applications.
155-164: Complete benchmark coverage.The benchmark suite comprehensively covers all SIMD-accelerated operations: equality, prefix/suffix matching, substring search, and realistic mixed workloads. This will provide good performance tracking for the SIMD implementation.
examples/simd_demo.rs (1)
1-102: Well-structured example demonstrating SIMD capabilities.The example effectively showcases all SIMD-accelerated operations with clear sections and practical use cases. The code is idiomatic and serves as good documentation for users.
One minor note: the comment on line 100-101 mentions ">= 16 bytes" which correctly matches the
SIMD_THRESHOLDconstant insrc/simd.rs.tests/simd.rs (5)
1-4: LGTM - Good feature gating for SIMD tests.The
#![cfg(feature = "simd")]attribute correctly ensures these tests only run when the SIMD feature is enabled, preventing compilation errors on platforms without SIMD support.
5-28: Comprehensive equality tests covering key scenarios.The test correctly verifies equality for:
- Short strings (below SIMD threshold)
- Long strings (above threshold)
- Cross-type comparisons with
strandString
30-120: Good coverage of pattern matching operations.The tests for
starts_with,ends_with,contains, andfindare thorough, covering:
- Short and long patterns
- Edge cases with empty strings and empty patterns
- Correct position indices for
find
122-136: Critical Unicode test for byte-level SIMD operations.This test is essential because SIMD operates on raw bytes. The test correctly verifies:
- Multi-byte UTF-8 characters (世界 = 6 bytes, 🌍 = 4 bytes) work properly
find("世界")returns byte position 6, not character position
138-202: Excellent edge case coverage.The remaining tests cover critical scenarios:
- Alignment tests (lines 139-150): Test all 16 alignment offsets - important for SIMD correctness
- Boundary conditions (lines 153-167): Test around the 16-byte threshold
- Pattern at end (lines 170-180): Ensure patterns at string boundaries are found
- Inline storage (lines 192-202): Verify SIMD works with inline-stored strings (≤23 bytes)
src/cheetah_string.rs (5)
457-485: Well-implemented SIMD integration forstarts_with.The conditional compilation is correct:
- Uses SIMD path when feature is enabled
- Falls back to standard
str::starts_withotherwise- Char patterns correctly bypass SIMD (single char comparison doesn't benefit from SIMD)
Documentation is updated appropriately.
503-533: Consistent SIMD integration forends_with.The implementation mirrors
starts_withappropriately, with SIMD acceleration for string patterns and standard comparison for single characters.
551-581: SIMDcontainsusesfind_bytesefficiently.Using
find_bytes(...).is_some()is the correct approach since the SIMD module doesn't have a dedicatedcontains_bytesfunction. This reuses the existing search logic.
599-624: SIMD integration forfindis straightforward.The
findmethod correctly usesAsRef<str>and integrates withsimd::find_bytes. Note that unlikestarts_with/ends_with/contains, this doesn't have a separate char pattern path - which is fine since single-character searches don't benefit significantly from SIMD for this use case.
950-990: SIMD-enabledPartialEqimplementations look correct.The SIMD path for equality comparisons is well-integrated with proper feature gating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request implements SIMD (Single Instruction, Multiple Data) acceleration for common string operations in CheetahString using x86_64 SSE2 instructions. The implementation provides runtime feature detection and automatically falls back to scalar operations when SIMD is unavailable or for inputs below the 16-byte threshold.
Key changes:
- New
simdfeature flag and module with SSE2-accelerated implementations of string comparison and search operations - Integration of SIMD acceleration into
starts_with(),ends_with(),contains(),find(), and equality comparison methods - Comprehensive test suite and benchmarks for SIMD functionality
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/simd.rs | New module implementing SSE2-accelerated string operations with automatic fallback to scalar code |
| src/cheetah_string.rs | Integrates SIMD functions into string methods with conditional compilation for the simd feature |
| src/lib.rs | Adds simd module declaration and documentation for SIMD acceleration feature |
| tests/simd.rs | Comprehensive test coverage for SIMD-accelerated operations including edge cases and Unicode handling |
| benches/simd.rs | Performance benchmarks for SIMD operations across various string sizes and realistic workloads |
| examples/simd_demo.rs | Demonstration of SIMD-accelerated operations with real-world use cases |
| Cargo.toml | Adds simd feature flag and simd benchmark configuration |
| README.md | Updates documentation with SIMD feature information and version bump to 1.0.0 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/simd.rs (1)
256-264: Consider adding boundary and edge-case tests.The test coverage is reasonable for initial implementation. For robustness, consider adding tests for:
- Strings exactly at the 16-byte SIMD threshold
- Matches/differences at SIMD chunk boundaries (positions 15, 16, 17)
- Large inputs to ensure the scalar fallback for remaining bytes works correctly
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lib.rssrc/simd.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib.rs
🔇 Additional comments (4)
src/simd.rs (4)
7-8: Import architecture gate looks correct.The x86_64 intrinsics import is now properly gated with both the feature flag and architecture check. This addresses the previous compilation failure concern for non-x86_64 platforms.
108-136: LGTM: SSE2 equality comparison implementation is correct.The implementation correctly:
- Processes 16 bytes per iteration using unaligned loads
- Uses
_mm_cmpeq_epi8for byte-wise comparison and checksmask == 0xFFFFfor all-equal- Falls back to scalar for remaining bytes
141-179: Correct implementation; consider future optimization.The hybrid two-phase approach (find first byte candidates, then verify full match) is functionally correct. For heavily repeated first bytes, there may be room for optimization using algorithms like SIMD-accelerated Two-Way search, but this is a reasonable baseline implementation.
184-211: LGTM: Single-byte search implementation is correct.Standard and efficient SSE2 pattern: broadcast needle, compare 16 bytes at a time, use
trailing_zeroson mask to find first match position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/simd.rs (1)
74-93: Consider adding test for pattern longer than haystack.The edge cases cover empty strings well, but you might want to add an explicit test where the pattern is longer than a non-empty haystack to verify the SIMD implementation handles this correctly:
🔎 Optional additional test case
// Edge cases assert!(s.contains("")); let empty = CheetahString::from(""); assert!(empty.contains("")); assert!(!empty.contains("a")); + + // Pattern longer than haystack + let short = CheetahString::from("abc"); + assert!(!short.contains("abcdefghijklmnop")); }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/simd.rs
🧰 Additional context used
🧬 Code graph analysis (1)
tests/simd.rs (1)
src/cheetah_string.rs (4)
starts_with(471-485)ends_with(519-533)contains(567-581)find(614-624)
🔇 Additional comments (7)
tests/simd.rs (7)
5-28: LGTM!Good test coverage for equality operations across SIMD threshold boundaries and with different string types (
&str,String).
30-93: LGTM!Comprehensive coverage for
starts_with,ends_with, andcontainsoperations including short/long patterns and empty string edge cases.
95-120: LGTM!The
findtests correctly verify expected positions and cover important edge cases including empty patterns and first-character matches.
122-136: LGTM!Critical coverage for Unicode handling. Since SIMD operates at byte level, these tests validate that multi-byte UTF-8 sequences (Chinese characters, emoji) are handled correctly.
138-150: LGTM!Good coverage for memory alignment scenarios. Testing offsets 0-15 ensures the SIMD implementation handles both aligned and unaligned memory accesses correctly.
152-167: LGTM!Testing at exactly the SIMD threshold (16 bytes) and just below/above is important for validating the transition between scalar and SIMD code paths.
169-202: LGTM!Good coverage for edge cases: patterns at string boundaries, first-occurrence semantics for
find, and inline storage optimization (≤23 bytes). These tests help ensure SIMD operations work correctly with different internal string representations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/simd.rs (1)
87-95: Consider relaxing SIMD threshold for single-byte needles.The current condition requires
needle.len() >= SIMD_THRESHOLD(16 bytes), butfind_bytes_sse2has optimized handling for single-byte needles (line 150-152). A single-byte search in a large haystack could benefit from SIMD acceleration.Consider changing the condition to:
if is_x86_feature_detected!("sse2") && haystack.len() >= SIMD_THRESHOLD && (needle.len() == 1 || needle.len() >= SIMD_THRESHOLD)This would enable SIMD for single-byte searches in large haystacks while still avoiding overhead for small multi-byte needles.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/simd.rs
🔇 Additional comments (7)
src/simd.rs (7)
1-11: Architecture gating properly fixed.The import and threshold configuration are now correct with proper x86_64 architecture gating.
13-29: LGTM.The equality check correctly handles length mismatch, dispatches to SIMD when beneficial, and falls back to standard comparison.
31-51: LGTM.The starts_with implementation correctly handles edge cases (empty needle, length checks) and appropriately dispatches to SIMD.
53-74: LGTM.The ends_with implementation correctly calculates the suffix offset and handles edge cases appropriately.
105-136: LGTM.The SSE2 equality implementation is correct: unaligned loads, proper byte-wise comparison, early-exit on mismatch, and scalar handling of remaining bytes.
138-179: LGTM.The substring search correctly uses a hybrid approach: SIMD-accelerated first-byte search followed by full verification, with proper bounds checking and position advancement.
181-211: LGTM.The single-byte search correctly broadcasts the needle and uses bit manipulation to find the first match position.
fix #94
Summary by CodeRabbit
New Features
Benchmarks
Examples
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.