Skip to content

Conversation

@PekingSpades
Copy link

Summary

  • Replace the byte-by-byte addition in core/websock.js:_rQshift() with a DataView-backed fast path for 1/2/4 byte reads to cut CPU time in the receive queue.
  • Maintain a cached DataView whenever the receive queue buffer is allocated or resized so the optimized path is always available.
  • Capture and share reproducible browser benchmarks that highlight the performance win across different engines and machines.

Performance Summary

Average speed-up = mean reduction in the 1/2/4-byte benchmark cases (higher is better).

Speed-up (% faster)
                0        10       20       30       40       50
                |--------|--------|--------|--------|--------|
Chrome   45.2%  █████████████████████████████████████████
Edge     40.9%  █████████████████████████████████████
Firefox  29.9%  ███████████████████████████
Safari   43.5%  ███████████████████████████████████████

Browser / Platform Avg speed-up
Windows Chrome 142 43.6% faster
Windows Chrome 142 (Machine 2) 46.4% faster
Windows Chrome 101 41.7% faster
Windows Chrome 92.0 45.6% faster
Windows Chrome 83.0 44.7% faster
Windows Chrome 71.0 49.0% faster
Windows Edge 142 46.1% faster
Windows Edge 142 (Machine 2) 35.6% faster
Windows Firefox 113 31.6% faster
Windows Firefox 142 35.7% faster
Windows Firefox 145.0 22.5% faster
Safari 18 43.5% faster

Testing

  • Manual benchmark – Windows 10, Chrome 142 (20 logical cores) ✅
  • Manual benchmark – Windows 10, Chrome 142 (dual-core machine) ✅
  • Manual benchmark – Windows 10, Chrome 101/92/83/71 ✅
  • Manual benchmark – Windows 10, Edge 142 (two hardware profiles) ✅
  • Manual benchmark – Windows 10, Firefox 113/142/145 ✅
  • Manual benchmark – macOS 10.15, Safari 18.6 ✅

Benchmark Results

Windows Chrome 142

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36
Platform Win32 HW concurrency 20
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 183.7 Used JS heap (MB) 176.1
Performance timeOrigin 1763268027246.2
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 205.920 192.900 249.300
1 DataView 10 164.550 156.800 201.300 🏆
2 loop 10 179.260 177.000 181.200
2 DataView 10 99.260 96.000 118.300 🏆
4 loop 10 184.910 181.100 197.600
4 DataView 10 62.880 60.700 73.600 🏆

Windows Chrome 142(Machine 2)

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 75.5 Used JS heap (MB) 72.3
Performance timeOrigin 1763455005177.6
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 366.130 353.100 424.600
1 DataView 10 235.270 226.700 275.500 🏆
2 loop 10 215.860 206.400 279.500
2 DataView 10 136.190 129.600 166.100 🏆
4 loop 10 261.090 238.400 290.300
4 DataView 10 87.640 76.900 96.700 🏆

Windows Chrome 101

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 73.5 Used JS heap (MB) 71.3
Performance timeOrigin 1763454953935.7
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 292.310 270.600 343.900
1 DataView 10 243.210 228.300 273.700 🏆
2 loop 10 227.190 216.700 290.900
2 DataView 10 127.550 123.700 147.800 🏆
4 loop 10 238.620 233.300 264.500
4 DataView 10 85.110 81.900 98.400 🏆

Windows Chrome 92.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 72.2 Used JS heap (MB) 69.8
Performance timeOrigin 1763454862654.8
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 332.680 310.400 374.800
1 DataView 10 239.060 226.800 267.200 🏆
2 loop 10 224.740 217.400 248.000
2 DataView 10 127.160 122.800 154.100 🏆
4 loop 10 241.880 230.600 282.900
4 DataView 10 83.860 75.000 113.400 🏆

Windows Chrome 83.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 3585.8
Total JS heap (MB) 73.1 Used JS heap (MB) 68.9
Performance timeOrigin 1763454793849.9363
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 372.971 349.125 447.595
1 DataView 10 288.493 251.800 475.025 🏆
2 loop 10 269.894 263.340 302.725
2 DataView 10 148.638 144.555 166.145 🏆
4 loop 10 267.037 245.535 303.445
4 DataView 10 89.074 82.745 119.115 🏆

Windows Chrome 71.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.44 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 2222.1
Total JS heap (MB) 9.5 Used JS heap (MB) 9.5
Performance timeOrigin 1763454480039.055
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 268.720 224.600 449.000
1 DataView 10 218.370 210.400 236.700 🏆
2 loop 10 318.890 229.300 422.500
2 DataView 10 151.140 113.800 217.400 🏆
4 loop 10 259.290 229.800 318.600
4 DataView 10 63.390 57.000 81.200 🏆

Windows Edge 142

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0
Platform Win32 HW concurrency 16
Device memory (GB) 8 Language en
Languages en, zh-CN, en-GB, en-US Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 123.7 Used JS heap (MB) 114.7
Performance timeOrigin 1763453859070.7
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 281.940 267.900 343.200
1 DataView 10 213.270 202.200 268.100 🏆
2 loop 10 228.380 226.100 241.200
2 DataView 10 126.480 123.100 155.000 🏆
4 loop 10 249.870 246.300 264.600
4 DataView 10 76.660 74.500 88.000 🏆

Windows Edge 142(Machine 2)

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, en, en-GB, en-US Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 93.1 Used JS heap (MB) 89.2
Performance timeOrigin 1763432982331
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 281.980 249.200 366.600
1 DataView 10 245.630 233.200 272.700 🏆
2 loop 10 260.800 207.600 384.000
2 DataView 10 211.140 135.500 398.800 🏆
4 loop 10 868.830 295.300 3937.300
4 DataView 10 219.100 90.300 735.600 🏆

Windows Firefox 113

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0
Platform Win32 HW concurrency 2
Language zh-CN Languages zh-CN, zh, zh-TW, zh-HK, en-US, en
Screen resolution 2560x1440 Screen pixel depth 24
Performance timeOrigin 1763454236144
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 679.600 611.000 918.000
1 DataView 10 656.000 621.000 747.000 🏆
2 loop 10 500.000 487.000 519.000
2 DataView 10 364.100 342.000 430.000 🏆
4 loop 10 893.800 590.000 1707.000
4 DataView 10 319.800 244.000 506.000 🏆

Windows Firefox 142

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:142.0) Gecko/20100101 Firefox/142.0
Platform Win32 HW concurrency 2
Language zh-CN Languages zh-CN, zh, zh-TW, zh-HK, en-US, en
Screen resolution 2560x1440 Screen pixel depth 24
Performance timeOrigin 1763454368830
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 1069.500 547.000 2812.000
1 DataView 10 614.900 532.000 700.000 🏆
2 loop 10 457.100 386.000 562.000
2 DataView 10 378.400 327.000 550.000 🏆
4 loop 10 498.000 244.000 2302.000
4 DataView 10 261.600 196.000 591.000 🏆

Windows Firefox 145.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:145.0) Gecko/20100101 Firefox/145.0
Platform Win32 HW concurrency 2
Language zh-CN Languages zh-CN, zh, zh-TW, zh-HK, en-US, en
Screen resolution 2560x1440 Screen pixel depth 24
Performance timeOrigin 1763455136427
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 601.100 541.000 749.000
1 DataView 10 503.900 459.000 646.000 🏆
2 loop 10 426.900 346.000 525.000
2 DataView 10 327.800 268.000 377.000 🏆
4 loop 10 256.500 233.000 310.000
4 DataView 10 184.100 169.000 203.000 🏆

Safari 18

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.6 Safari/605.1.15
Platform MacIntel HW concurrency 4
Language en-US Languages en-US
Screen resolution 3840x2160 Screen pixel depth 24
Performance timeOrigin 1763455352927
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 1415.600 1365.000 1546.000
1 DataView 10 962.200 934.000 1029.000 🏆
2 loop 10 889.100 870.000 917.000
2 DataView 10 472.100 468.000 474.000 🏆
4 loop 10 849.900 694.000 1098.000
4 DataView 10 412.500 269.000 615.000 🏆

Karma Test

  Websock
    Receive queue methods
      rQpeek8
        √ should peek at the next byte without poping it off the queue
      rQshift8()
        √ should pop a single byte from the receive queue
      rQshift16()
        √ should pop two bytes from the receive queue and return a single number
      rQshift32()
        √ should pop four bytes from the receive queue and return a single number
      rQlen())
        √ should return the number of buffered bytes in the receive queue
      rQshiftStr
        √ should shift the given number of bytes off of the receive queue and return a string
        √ should be able to handle very large strings
      rQshiftBytes
        √ should shift the given number of bytes of the receive queue and return an array
        √ should return a shared array if requested
      rQpeekBytes
        √ should not modify the receive queue
        √ should return a shared array if requested
      rQwait
        √ should return true if there are not enough bytes in the receive queue
        √ should return false if there are enough bytes in the receive queue
        √ should return true and reduce rQi by "goback" if there are not enough bytes
        √ should raise an error if we try to go back more than possible
        √ should not reduce rQi if there are enough bytes
    Send queue methods
      sQpush8()
        √ should send a single byte
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
      sQpush16()
        √ should send a number as two bytes
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
      sQpush32()
        √ should send a number as two bytes
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
      sQpushString()
        √ should send a string buffer
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
        √ should implicitly split a large buffer
      sQpushBytes()
        √ should send a byte buffer
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
        √ should implicitly split a large buffer
      flush
        √ should actually send on the websocket
        √ should not call send if we do not have anything queued up
    lifecycle methods
      opening
        √ should pick the correct protocols if none are given
        √ should open the actual websocket
      attaching
        √ should attach to an existing websocket
      closing
        √ should close the actual websocket if it is open
        √ should close the actual websocket if it is connecting
        √ should not try to close the actual websocket if closing
        √ should not try to close the actual websocket if closed
        √ should reset onmessage to not call _recvMessage
      event handlers
        √ should call _recvMessage on a message
        √ should call the open event handler on opening
        √ should call the close event handler on closing
        √ should call the error event handler on error
      ready state
        √ should be "unused" after construction
        √ should be "connecting" if WebSocket is connecting
        √ should be "open" if WebSocket is open
        √ should be "closing" if WebSocket is closing
        √ should be "closed" if WebSocket is closed
        √ should be "unknown" if WebSocket state is unknown
        √ should be "connecting" if RTCDataChannel is connecting
        √ should be "open" if RTCDataChannel is open
        √ should be "closing" if RTCDataChannel is closing
        √ should be "closed" if RTCDataChannel is closed
        √ should be "unknown" if RTCDataChannel state is unknown
    WebSocket receiving
      √ should support adding data to the receive queue
      √ should call the message event handler if present
      √ should not call the message event handler if there is nothing in the receive queue
      √ should compact the receive queue when fully read
      √ should compact the receive queue when we reach the end of the buffer
      √ should automatically resize the receive queue if the incoming message is larger than the buffer
      √ should automatically resize the receive queue if the incoming message is larger than 1/8th of the buffer and we reach the end of the buffer

Can I use

https://caniuse.com/mdn-javascript_builtins_dataview

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant