You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -197,6 +202,57 @@ h = { "x" => 10, "y" => { "nested" => true } }
197
202
CBOR.decode(CBOR.encode(h)) == h # => true
198
203
```
199
204
205
+
### Fast Encoding/Decoding
206
+
207
+
For high-throughput internal use where both encoder and decoder are the **same mruby build**, `encode_fast` and `decode_fast` provide a significantly faster path (~30% faster encode, ~20% faster decode on typical structured message payloads).
208
+
209
+
```ruby
210
+
buf =CBOR.encode_fast(obj)
211
+
obj =CBOR.decode_fast(buf)
212
+
```
213
+
214
+
**What differs from canonical encoding:**
215
+
216
+
- Integers always encode at the full native width (`MRB_INT_BIT` bits), never shortest-form
217
+
- Floats always encode at the full native width (`MRB_USE_FLOAT32` → f32, else → f64)
218
+
- Strings, arrays, and maps use canonical shortest-form length prefixes (same as canonical)
219
+
- No UTF-8 validation on strings
220
+
- Symbols always encode as tag 39 + string (ignores the global symbol strategy setting)
221
+
- Classes and modules encode as tag 49999 + name string (same as canonical)
222
+
- Registered tags, bigints, UnhandledTag, and proc-tag types fall back to canonical encoding transparently — `encode_fast` never raises on an unsupported type
**Buffers produced by `encode_fast` must only be decoded by `decode_fast` on a mruby binary compiled with identical settings.** Decoding a fast buffer on a different build produces silent data corruption — no error is raised, values are simply wrong.
241
+
242
+
Never use `encode_fast` / `decode_fast` for:
243
+
- Data sent across a network to nodes that may differ in build config
244
+
- Data written to disk and read back by a different binary
245
+
- Any context where you do not fully control both encoder and decoder
246
+
247
+
For actor groups that span multiple machines, all nodes in the group must be compiled from the same mruby configuration. The group join handshake should verify `MRB_INT_BIT` and `MRB_USE_FLOAT32` explicitly before admitting a node.
|**Compile flags**| Might affect numeric representation | Different `CFLAGS`*could* theoretically affect float behavior (though unlikely in practice) |
604
664
|**Symbol IDs**| Non-portable across mruby binaries | Presym IDs differ between mruby builds; use `symbols_as_string` for portability |
665
+
|**`encode_fast` integer width**| Non-portable across builds with different `MRB_INT_BIT`| Fast buffers must never cross build boundaries |
605
666
606
-
**Practical: Same mruby binary + same input = same output, forever.** For cross-machine reproducibility, use `symbols_as_string` (portable) instead of `symbols_as_uint32` (binary-specific).
667
+
**Practical: Same mruby binary + same input = same output, forever.** For cross-machine reproducibility, use `symbols_as_string` (portable) instead of `symbols_as_uint32` (binary-specific), and use `encode` / `decode` instead of `encode_fast` / `decode_fast` unless all peers share the same build.
607
668
608
669
### RFC 8949 Compliance
609
670
@@ -624,17 +685,21 @@ This implementation strictly follows RFC 8949:
**Lazy decoding shines:** When you only need a few fields from a 10 MB payload, lazy is 10–100× faster than eager.
637
700
701
+
**`encode_fast` trade-off:** Fixed-width integers produce larger wire output for small values (e.g. `1` encodes as 9 bytes instead of 1). For integer-heavy payloads (large arrays of small numbers) the canonical encoder is actually faster due to lower `memcpy` volume. The fast path wins on rich structured messages with string keys and mixed scalar values — the typical actor message shape.
5. Continue doubling until the full document is buffered
674
739
6. Then read exactly the remaining bytes needed (if any) to avoid over-reading
675
740
676
-
**Why doubling?** CBOR documents can be arbitrarily nested (arrays, maps, tags wrapping each other). Finding the document boundary requires parsing the structure, not just reading a length header. The doubling strategy balances:
677
-
- Most documents (< 16 KB) fit in 1–2 reads
678
-
- Large documents don't require excessive seeks
679
-
- No fixed buffer size that wastes memory or fails on edge cases
680
-
681
741
---
682
742
683
743
## Error Handling
@@ -752,6 +812,23 @@ Issues, PRs, and bug reports welcome. See `interop.py` for testing against other
752
812
Key insight: No float rounding—entire algorithm is integer bit manipulation.
753
813
```
754
814
815
+
### Fast Encoding Algorithm
816
+
817
+
```
818
+
For each value:
819
+
integer → fixed-width (MRB_INT_BIT / 8 bytes), major 0 or 1
820
+
float → fixed-width (sizeof(mrb_float) bytes), 0xFA or 0xFB
821
+
string → canonical length prefix + bytes, no UTF-8 check
822
+
array → canonical length prefix + fast-encoded elements
**Presym IDs are non-portable:** Symbol ID 42 on your mruby might be ID 100 on another mruby built with different `--enable-presym-inline` settings.
806
884
@@ -832,9 +910,7 @@ The encoder checks each string's UTF-8 validity at encode time and chooses the a
832
910
833
911
When **decoding**, text strings (major type 3) are validated as UTF-8 **when mruby is compiled with `MRB_UTF8_STRING`**. If mruby was compiled without UTF-8 string support, the validation is skipped (the strings are still decoded, just not validated).
834
912
835
-
Byte strings (major type 2) are never validated or touched—they're uninterpreted binary, regardless of compile flags.
836
-
837
-
This matches RFC 8949 (which requires UTF-8 for text strings) and prevents UTF-8 injection attacks when validation is enabled.
913
+
`encode_fast` always emits strings as major type 3 without UTF-8 validation — faster but trusts the caller to provide valid UTF-8.
0 commit comments