Fast and Lean JavaScript Dictionaries #880

yoshi-monster · 2025-11-16T16:17:23Z

Hi! Sorry I did a bunch of benchmarks, so wall of text incoming.

tl;dr: The new version is much simpler and potentially faster, although it's not super obvious right now yet until we optimise other bits of the runtime. I think it's still overall an improvement.

This PR rewrites the Dict implementation for the JavaScript target, implementing the CHAMP (Compressed Hash Array Mapped Prefix-trees) data structure as described by M.J. Steindorfer and J.J. Vinju in Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections (2015, link). It also adds some optimizations found in ClojureScript, attributed to Christopher Grand by P. Schluck and C. Rodgers in their 2017 Talk Closure Hash Maps: Plenty of room at the bottom (link). Namely, it adds internal "transient" objects that allow for mutating the dictionary in controlled ways internally.

The current implementation is already quite good, so only minor performance improvements could be achieved for most operations in practice; I did a lot of benchmarks to figure out what's going on and what's possible, so sorry for the wall of tables.

Highlights

50% reduction in code size
10-30% faster get and insert operations
O(log n) equality checks, orders of magnitude faster bulk operations and iteration

Establishing a baseline

I think it's useful to look at the absolute best performance we could achieve first to know what we're up against. All benchmarks were done on an M4 Pro 14-core chip using Node 24.7 and Bun 1.21.2, with mitata as the benchmark runner.

This measures the performance of raw JavaScript Map calls. Keep in mind that this is not a suitable replacement in practice, as it only supports strings or numbers as keys and does not provide a persistent/immutable API at all.

Without further ado, here's what we're up against:

	10	100	1k	10k	100k	1m
node / get	7.6ns	11ns	14ns	13ns	17ns	28ns
node / insert	54ns	52ns	53ns	54ns	54ns	88ns
node / from_list	400ns	2.5us	27us	654us	5.0ms	57ms
node / fold	30ns	223ns	2.2us	22us	230us	6.8ms
bun / get	4.6ns	4.9ns	6.3ns	6.8ns	8.4ns	12ns
bun / insert	49ns	53ns	53ns	54ns	53ns	74ns
bun / from_list	394ns	4.3us	35us	345us	4.2ms	68ms
bun / fold	1us	598ns	6us	61us	561us	6ms

It looks like it takes around 10ns to get an element, and around 50ns to insert one. The exact numbers here aren't that interesting, but I think it always helps to have a target and intuition how fast things should/could be. Note that this is still quite slow compared to JIT-optimised object shapes! These can be accessed and updated within a fraction of a nano-second.

It's also useful to think about what these numbers mean: at the 4 GHz or so my processor claims, one cycle is 0.25ns. Most instructions take a bunch of cycles to retire, so we can simplify a ton and just say 1ns = 1 instruction on average. A number like 4ns (for get) means that the CPU executed a single-digit number of instructions to get the value we wanted! Sometimes adding a branch increased latency by ~5ns; there are certainly more low-level optimisations to be done by someone with deeper knowledge of the code v8 generates, but we are actually at the point of counting instructions here!

Implementation

The rewritten dictionary is a faithful implementation of the version presented in Steindorfer and Vinjus paper, with the addition of transients to allow for fast updates. The class-based Javascript API has been fully removed; this allows the dictionary and the hash function to be dead-code-eliminated almost entirely even if it is referenced incidentally through string.inspect or dynamic.classify. Specialised algorithms for map, insert, and has_key are provided to speed up common dictionary operations. I cleaned up the FFI API to no longer have this additional layer of indirection through gleam_stdlib.mjs; I think having the dictionary self-contained inside it's own file makes the code easier to maintain. Many of the public functions in the dictionary module have been replaced by more efficient versions on both targets.

Instead of 4 different node types, the new version only has a single (+ a "deformed" version on hash collisions) internal node type, eliminating most of the incidental complexity found within the old implementation. Overall, the dictionary went from roughly 560 LOC to around 290, a reduction of almost 50%.

The new delete algorithm also respects the CHAMP invariant, effectively banning nested node paths that only hold a single entry. Instead, lone entries are "pulled up" into the parent node recursively whenever possible. As a consequence, all CHAMP trees are guaranteed to be in their single, canonical, compact tree representation. This is particularly interesting for equality checks: Instead of having to walk all elements of the dictionary, the nodes can be compared directly. Nodes that are reference-equal or have different bitmaps don't have to be compared further. Dictionaries don't have to have their own equals or hashCode implementation anymore, but standard structural equality not only works, but also improves the complexity from O(n log n) to O(log n) on average.

While iterating, key order can change compared to the old version with this PR. Since dictionaries don't guarantee any order, no effort has been put into preserving the order at all.

Benchmarks

Result construction, isEqual, and the hash function contribute significantly to the runtime of various dict operations. To make things more directly comparable, I provide "real" numbers including their overhead as well as "adjusted" variants where all of these operations have been replaced by no-ops, or strict equality in the case of isEqual. These measure the performance of the data structure itself more directly, which also makes them more comparable to the "baseline" measurements provided above.

`get`

old / new	10	100	1k	10k	100k	1m
node / real	93/85ns	105/107ns	121/120ns	136/130ns	198/187ns	682/418ns
bun / real	42/35ns	48/39ns	55/44ns	76/64ns	165/130ns	765/583ns
node / adjusted	6.0/5.0ns	9.2/6.9ns	12/7.0ns	17/11ns	52/19ns	315/43ns
bun / adjusted	5.7/4.7ns	12/3.2ns	9.5/3.2ns	15/6.6ns	55/15ns	265/21ns

Since get is so fast, I generated an array of 1000 random keys that I all fetched in a loop. The numbers reported are the average for 1 single get operation. Hopefully this avoids measuring artifacts and memory locality effects. It's hard to explain why using results would be relatively slower the bigger the dictionary gets. Interestingly, the same effect doesn't happen when fetching the same element 1000 times, or when fetching random elements but using object literals ({ success: true, value: ... }) for results:

old / new	10	100	1k	10k	100k	1m
node / same	87/80ns	99/97ns	103/102ns	101/103ns	104/109ns	114/119ns
node / literal	26/25ns	32/27ns	32/26ns	49/32ns	96/78ns	552/217ns

Benchmarking hard 😔 While we know that objects are faster than the current results, I also think it's unlikely that they would be that much faster too.

In the the Lustre diff benchmark (which is mostly get and has_key), performance is improved by 2-3 times compared to v0.65.0, getting within within 30% of using native mutable maps, and even beating them for common cases:

Overall, without overhead get has been improved by roughly 30% for small dictionaries, and is up to 5 times faster for very large ones. The adjusted values compare favourably to even built-in maps using v8, suggesting that a custom persistent immutable data structures can be as fast as natively implemented data structures provided by the runtime. Since Buns maps are much more performant in this benchmark I suspect that v8's native implementation could also be improved. For Bun and JavaScriptCore, the new dictionary seems to be especially faster.

`insert`

old / new	10	100	1k	10k	100k	1m
node / update	50/54ns	89/93ns	109/109ns	128/122ns	217/189ns	629/422ns
bun / update	41/29ns	87/72ns	106/87ns	148/114ns	287/235ns	799/615ns
node / insert	33/39ns	48/46ns	83/129ns	94/89ns	144/114ns	345/234ns
bun / insert	37/37ns	53/48ns	99/120ns	111/103ns	181/131ns	618/257ns

Both old and new versions beat built-in maps consistently for single inserts into maps up to 100 elements. While performance of CHAMP starts to become better for the largest dictionaries, it is more affected by outliers in its internal structure, making the 1k elements case particularly slow.

`from_list`

old / new	10	100	1k	10k	100k	1m
node / real	295/251ns	7.5/6.6us	96/82us	958/988us	17/11ms	259/139ms
bun / real	292/253ns	8.8/6.2us	104/59us	1.2/0.8ms	25/11ms	392/148ms
node / adjusted	244/165ns	4.2/4.5us	52/42us	801/419us	14/7.4ms	213/102ms
bun / adjusted	195/312ns	5.0/6.6us	65/70ns	853/532us	16/8.1ms	259/91ms

from_list shows the power of transients, making it over twice as fast than a copying implementation for large dictionaries. A version exceeding native Maps speed can be achieved, but would require making the single insert case slower.

`fold`

old / new	10	100	1k	10k	100k	1m
node	210/15ns	1.6/0.2us	15/0.9us	338/15us	2.6/0.3ms	51/2.0ms
bun	100/8.8ns	1.2/0.2us	12/0.9us	110/14us	1.9/0.4ms	39/3.9ms

The old fold implementation used to_list under the hood, so performance improved massively. Not only that, but the new dictionary can be iterated faster than built-in maps, too!

Discussion

All of the tested operations perform at least equally as fast as the current implementation. Most show significant performance increases as the dictionaries get bigger, but (except for insert) this doesn't seem to come at a cost for small dictionaries. While CHAMP features some highly specialized update routines, it is still less than half the amount of code as the current version.

Eliminating additional overhead, we can see that CHAMP can be competitive with built-in maps for get and insert operations up to dictionaries of around 1000 elements. Many of the benchmarks show a significant overhead for both result construction and isEquals. Both versions of the dictionary readily beat built-in maps once iteration or persistence is involved.

Future work

While the immediate space for optimizations has been thoroughly explored, there might still be improvements by finding better mutating algorithms, or by exploring caching with the MEMCHAMP variant. While equality already uses the internal structure of the map, I suspect that similar optimizations could be done to all set-like operations (merge, union, intersection, difference, etc), using the internal structure to quickly combine nodes instead of working on the element level.

Since bulk operations are particularly fast, I think adding the missing merge_list or insert_from_list function might be useful. To avoid the overhead from results, a function similar to get_or_default could be provided.

Moving dictionary into the compiler and parameterising it over the used getHash and isEquals function would allow the compiler to inject monomorphised versions of these functions for the concrete key type used, skipping the generic implementations alltogether. Long term, escape analysis or similar techniques could be used to insert transients automatically.

While working on the dictionary, I noticed that the tag is ignored for the hash code of all custom types, meaning that Ok(0) or Error(0) hash to the same value. This mostly affects variants without attached data, which all hash to 0, causing the dictionary to degrade to a linear search. I'll open up a discussion at some point since I have some additional thought about the hash an equals functions ^^

Related work

The results presented here and by Steindorfer and Vinju have been confirmed independently by the Scala, Closure and ClosureScript communities, all implementing a variant of CHAMP as their default HashMap implementation. I also referenced this Go implementation for their handling of transients.

Appendix: Benchmark Code

Benchmark code

import { run, bench, boxplot, lineplot, summary, do_not_optimize } from "mitata";
import * as New from "./build/dev/javascript/gleam_stdlib/gleam/dict.mjs";
import * as Old from "./old-dict.mjs";

import * as List from "./build/dev/javascript/gleam_stdlib/gleam/list.mjs";
import { toList } from "./build/dev/javascript/prelude.mjs";

lineplot(() => {
  summary(() => {
    bench(`new.get($size)`, function* (state) {
      const size = state.get("size");
      const dict = New.from_list(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () =>
          Array.from(
            { length: 1000 },
            () => 1 + Math.trunc(Math.random() * size),
          ),
        bench: (is) => {
          for (let i = 0; i < 1000; ++i) do_not_optimize(New.get(dict, is[i]));
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");
    bench(`old.get($size)`, function* (state) {
      const size = state.get("size");
      const dict = Old.from_list(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () =>
          Array.from(
            { length: 1000 },
            () => 1 + Math.trunc(Math.random() * size),
          ),
        bench: (is) => {
          for (let i = 0; i < 1000; ++i) do_not_optimize(Old.get(dict, is[i]));
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");
    bench(`map.get($size)`, function* (state) {
      const size = state.get("size");
      const dict = new Map(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () =>
          Array.from(
            { length: 1000 },
            () => 1 + Math.trunc(Math.random() * size),
          ),
        bench: (is) => {
          for (let i = 0; i < 1000; ++i) do_not_optimize(dict.get(is[i]));
        },
      };
    }).range("size", 10, 1000000, 10);
  });
});

lineplot(() => {
  summary(() => {
    bench(`old.insert($size)`, function* (state) {
      const size = state.get("size");
      const dict = Old.from_list(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () => 1 + Math.trunc(Math.random() * 0xffffff),
        bench: (x) => {
          for (let i = -1000; i < 0; ++i) do_not_optimize(Old.insert(dict, x, i));
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");

    bench(`new.insert($size)`, function* (state) {
      const size = state.get("size");
      const dict = New.from_list(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () => 1 + Math.trunc(Math.random() * 0xffffff),
        bench: (x) => {
          for (let i = -1000; i < 0; ++i) do_not_optimize(New.insert(dict, x, i));
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");
    bench(`map.insert($size)`, function* (state) {
      const size = state.get("size");
      const dict = new Map(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () => Math.trunc(Math.random() * 0xffffff),
        bench: (i) => {
          dict.set(i, i);
          return do_not_optimize(dict);
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");
  });
});

lineplot(() => {
  summary(() => {
    bench(`new.from_list($size)`, function* (state) {
      const size = state.get("size");

      const list = List.map(List.range(1, size), (i) => [i, i]);

      yield {
        [0]: () => list,
        bench: (list) => do_not_optimize(New.from_list(list)),
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");

    bench(`old.from_list($size)`, function* (state) {
      const size = state.get("size");

      const list = List.map(List.range(1, size), (i) => [i, i]);

      yield {
        [0]: () => list,
        bench: (list) => do_not_optimize(Old.from_list(list)),
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");

    bench(`map.from_list($size)`, function* (state) {
      const size = state.get("size");

      const list = List.map(List.range(1, size), (i) => [i, i]);

      yield {
        [0]: () => list,
        bench: (list) => do_not_optimize(new Map(list)),
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");
  });
});

lineplot(() => {
  summary(() => {
    bench(`new.fold($size)`, function* (state) {
      const size = state.get("size");

      const dict = New.from_list(List.map(List.range(1, size), (i) => [i, i]));

      yield {
        [0]: () => dict,
        bench: (dict) => do_not_optimize(New.fold(dict, 0, (s, k, v) => s + v)),
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");

    bench(`old.fold($size)`, function* (state) {
      const size = state.get("size");

      const dict = Old.from_list(List.map(List.range(1, size), (i) => [i, i]));

      yield {
        [0]: () => dict,
        bench: (dict) => do_not_optimize(Old.fold(dict, 0, (s, k, v) => s + v)),
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");

    bench(`map.fold($size)`, function* (state) {
      const size = state.get("size");

      const dict = new Map(List.map(List.range(1, size), (i) => [i, i]));

      yield {
        [0]: () => dict,
        bench: (dict) => {
          let s = 0;
          for (const [k, v] of dict.entries()) {
            s = s + v;
          }
          return do_not_optimize(s);
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");
  });
});

lineplot(() => {
  summary(() => {
    bench(`old.remove($size)`, function* (state) {
      const size = state.get("size");
      const dict = Old.from_list(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () => Math.trunc(Math.random() * size) + 1,
        bench: (x) => {
          do_not_optimize(Old.delete$(dict, x));
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");

    bench(`new.remove($size)`, function* (state) {
      const size = state.get("size");
      const dict = New.from_list(List.map(List.range(1, size), (i) => [i, i]));
      yield {
        [0]: () => Math.trunc(Math.random() * size) + 1,
        bench: (x) => {
          do_not_optimize(New.delete$(dict, x));
        },
      };
    })
      .range("size", 10, 1000000, 10)
      .gc("inner");
  });
});

await run();

yoshi-monster · 2025-11-16T16:19:45Z

It appears I used the new FFI syntax while the test runner is still using v1.11. Does the standard library support older Gleam versions as well?

GearsDatapacks · 2025-11-16T16:22:46Z

Nope, you can update the CI version and the gleam version in gleam.toml to 1.13. We'll need to update the rest of stdlib FFI at some point too

GearsDatapacks · 2025-11-16T16:26:20Z

Ah yes, you'll need to reformat that file to use the latest formatting style also

yoshi-monster · 2025-11-16T16:27:05Z

Yeah, thank you :D

lpil

This is fantastic! I am so impressed! Thank you

I wonder if there's any further testing we want to do to check for regressions here? Property testing or larger example test suite? Or are we confident enough with the existing tests?

src/gleam/dict.gleam

lpil · 2025-11-20T13:03:33Z

src/gleam/dict.gleam


+@external(erlang, "maps", "put")
+@external(javascript, "../dict.mjs", "put")
+fn put(key: k, value: v, transient: TransientDict(k, v)) -> TransientDict(k, v)


Could you give this a clearer name please, put is quite nondescript 🙏 One that communicates it will mutate when possible would be fab.

lpil · 2025-11-20T13:04:10Z

src/dict.mjs

-    case COLLISION_NODE:
-      return withoutCollision(root, key);
-  }
+export function put(key, value, transient) {


Same as above, a clearer name please 🙏

lpil · 2025-11-20T13:05:19Z

src/dict.mjs

-        }
-        ++i;
+
+function doPut(transient, node, key, value, hash, shift) {


Is there a better name for this? And also for doRemove?

If not, can we call it insert please, as that's the term we use. put is an Erlang term so good not to mix up our basic terminology.

src/dict.mjs

yoshi-monster · 2025-11-21T14:04:32Z

Thank you!!

The existing tests were already super useful for finding bugs and I thought they were quite good already; for iv most bugs that I had happened whenever the internal structure had to change, especially when the depth of the tree changes. We could maybe do some probabilistic tests around those places where we generate random sequences of inserts/deletes and compare the results to proplists?

I hope the names are better now, I'm really bad at those

~ 💜

lpil · 2025-11-23T20:18:27Z

Love the names, thank you.

maybe do some probabilistic tests around those places where we generate random sequences of inserts/deletes and compare the results to proplists?

Could be useful perhaps! What do you think? Do we have confidence to merge this now or do we want to do more beforehand?

yoshi-monster · 2025-11-28T14:19:31Z

Hi! Sorry this took so long.

I added some tests generating a random sequence of insert/delete operations and compares the result to proplists. I ran those a bunch locally and nothing broke (yet), so now I'm relatively confident that nothing will immediately break :)

lpil · 2025-12-05T10:51:29Z

idk why you're saying sorry 😁

lpil

This is such fantastic work!!! Thank you Yoshie! You continue to amaze me!

…t upcast to a float by accident

references

inoas-nbw · 2025-12-05T11:05:44Z

Thank you @yoshi-monster 💚

lpil reviewed Nov 20, 2025

View reviewed changes

yoshi-monster force-pushed the champ branch from 7d00b97 to 4cdf354 Compare November 28, 2025 14:17

lpil approved these changes Dec 5, 2025

View reviewed changes

lpil force-pushed the champ branch from 4cdf354 to edcb051 Compare December 5, 2025 10:57

yoshie added 16 commits December 5, 2025 12:01

overhaul the javascript dict implementation

ede9414

implement has_key as an external to avoid constructing a result

b622707

query -> update_with

3039a7d

use Math.imul instead of standard multiplication to make sure we don'…

6a90487

…t upcast to a float by accident

typos

12159d0

prefer a fast insert over fast upserts

c668425

skip computing the node index if it's not needed

0f52dc3

optimised copy functions for faster inserts

fd4b528

optimised delete

9a68848

update to gleam v1.13

17e5180

gleam format

492f257

convert from use

bdb4036

rename transient functions

9193a7c

make it more clear that transient updates destroy the old value

4b11c16

rename doPut/doRemove helpers, remove potentially boken oldChild

bd90261

references

add random operations / property tests

8b96145

yoshi-monster force-pushed the champ branch from edcb051 to 8b96145 Compare December 5, 2025 11:02

lpil merged commit baea2ff into gleam-lang:main Dec 5, 2025
7 checks passed

Uh oh!

Fast and Lean JavaScript Dictionaries #880

Fast and Lean JavaScript Dictionaries #880

Uh oh!

Conversation

yoshi-monster commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Highlights

Establishing a baseline

Implementation

Benchmarks

get

insert

from_list

fold

Discussion

Future work

Related work

Appendix: Benchmark Code

Uh oh!

yoshi-monster commented Nov 16, 2025

Uh oh!

GearsDatapacks commented Nov 16, 2025

Uh oh!

GearsDatapacks commented Nov 16, 2025

Uh oh!

yoshi-monster commented Nov 16, 2025

Uh oh!

lpil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lpil Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

lpil Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

lpil Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yoshi-monster commented Nov 21, 2025

Uh oh!

lpil commented Nov 23, 2025

Uh oh!

yoshi-monster commented Nov 28, 2025

Uh oh!

lpil commented Dec 5, 2025

Uh oh!

lpil left a comment

Choose a reason for hiding this comment

Uh oh!

inoas-nbw commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yoshi-monster commented Nov 16, 2025 •

edited

Loading

`get`

`insert`

`from_list`

`fold`