Increase performance for fitness_score_cardinality and genes_cardinality by tomtomwombat · Pull Request #10 · basvanwesting/genetic-algorithm

tomtomwombat · 2026-04-02T05:59:10Z

Cardinality estimation in calls to fitness_score_cardinality and genes_cardinality is a non-trivial performance bottleneck.
This PR replaces the cardinality-estimator with hyperloglockless to optimize those calls. I chose foldhash because it's fast (especially for small inputs) and because cardinality estimation doesn't need to be deterministic (correct me if I'm wrong on this! We can use a different hasher otherwise).

HyperLogLog::new(12); uses the same memory as CardinalityEstimator::<u64>::new(): 2^12 bytes.
The accuracy of estimation for small cardinalities is unchanged while being improved for cardinalities larger than 10^7 since cardinality-estimator no longer provides accurate estimation then (though such large cardinalities may be outside your use-case). You can find more performance and accuracy comparisons here.

In addition, hyperloglockless uses considerably less dependencies than cardinality-estimator.

The below benchmarks show before and after change (other benchmarks are not affected):

     Running benches\evolve.rs (target\release\deps\evolve-3edab9da1a4bc46e.exe)
Gnuplot not found, using plotters backend
evolve/binary-100-pop100-gen100
                        time:   [1.3711 ms 1.3717 ms 1.3723 ms]
                        change: [-9.0403% -8.7943% -8.5330%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
evolve/list-100-pop100-gen100
                        time:   [1.0351 ms 1.0357 ms 1.0361 ms]
                        change: [-6.8630% -6.4940% -6.0979%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
Benchmarking mutates-pop1000/MultiGeneDynamic(MultiGeneDynamic { _phantom: PhantomData<genetic_algorithm::genotyp...: Collecting 100 samples inmutates-pop1000/MultiGeneDynamic(MultiGeneDynamic { _phantom: PhantomData<genetic_algorithm::genotyp...
                        time:   [13.012 µs 13.108 µs 13.203 µs]
                        thrpt:  [75.741 Melem/s 76.289 Melem/s 76.854 Melem/s]
                 change:
                        time:   [-13.408% -7.1293% -2.4340%] (p = 0.01 < 0.05)
                        thrpt:  [+2.4948% +7.6766% +15.484%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking mutates-pop1000/MultiGeneDynamic(MultiGeneDynamic { _phantom: PhantomData<genetic_algorithm::genotyp...: Collecting 100 samples inmutates-pop1000/MultiGeneDynamic(MultiGeneDynamic { _phantom: PhantomData<genetic_algorithm::genotyp...
                        time:   [13.012 µs 13.108 µs 13.203 µs]
                        thrpt:  [75.741 Melem/s 76.289 Melem/s 76.854 Melem/s]
                 change:
                        time:   [-13.408% -7.1293% -2.4340%] (p = 0.01 < 0.05)
                        thrpt:  [+2.4948% +7.6766% +15.484%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
     Running benches\population.rs (target\release\deps\population-76b56ff9d5372789.exe)
Gnuplot not found, using plotters backend
population/fitness_score_cardinality (known score), low/100
                        time:   [1.4007 µs 1.4348 µs 1.4741 µs]
                        thrpt:  [67.837 Melem/s 69.695 Melem/s 71.395 Melem/s]
                 change:
                        time:   [-23.952% -21.707% -19.155%] (p = 0.00 < 0.05)
                        thrpt:  [+23.694% +27.726% +31.496%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
population/fitness_score_cardinality (known score), low/1000
                        time:   [13.278 µs 13.445 µs 13.640 µs]
                        thrpt:  [73.314 Melem/s 74.378 Melem/s 75.311 Melem/s]
                 change:
                        time:   [-25.040% -23.860% -22.609%] (p = 0.00 < 0.05)
                        thrpt:  [+29.214% +31.336% +33.404%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

These benchmarks were run with

AMD Ryzen 9 5900X 12-Core Processor (3.70 GHz)
64-bit operating system, x64-based processor
RUSTFLAGS="-C target-cpu=native"

…cardinality and genes_cardinality

basvanwesting · 2026-04-02T07:30:29Z

Thanks, I see you are the author of hyperloglockless.

While the claimed performance increase is relevant percentage-wise, I'm not sure the cardinality estimation is a non-trivial performance bottleneck in practice. You did not provide evidence for that claim (especially with respect to the Fitness calculations, which are the bottleneck in real-world usage).

Also, our use case centers on very low cardinality counts: typically between 100 and 1000. So that is very limited use case, we need to keep that in mind. It will have special performance characteristics for these low levels.

Furthermore cardinality-estimator has a benchmark with conflicting conclusions regarding your implementation.

I will consider the switch. If you can provide evidence of the non-trivial performance bottleneck in real-world usage, that would make a difference.

Regards, Bas

tomtomwombat · 2026-04-02T17:42:42Z

Thanks for the quick response

Furthermore cardinality-estimator has a benchmark with conflicting conclusions regarding your implementation.

Which benchmark and conflicting conclusion are you referring to? They don't include hyperloglockless in their benchmarks (they include hyperloglog crate instead).

Also, our use case centers on very low cardinality counts: typically between 100 and 1000. So that is very limited use case, we need to keep that in mind. It will have special performance characteristics for these low levels.

That's useful context. What do you mean by "It will have special performance characteristics for these low levels."? Also, I'm curious what you prioritize in cardinality estimation in genetic-algorithm, e.g. performance, accuracy, or memory?

I will consider the switch. If you can provide evidence of the non-trivial performance bottleneck in real-world usage, that would make a difference.

I don't use genetic-algorithm myself. Is there a benchmark the reflects real-world usage?

use hyperloglockless crate for increase performance in fitness_score_…

c7c66ce

…cardinality and genes_cardinality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase performance for fitness_score_cardinality and genes_cardinality#10

Increase performance for fitness_score_cardinality and genes_cardinality#10
tomtomwombat wants to merge 1 commit into
basvanwesting:mainfrom
tomtomwombat:cardinality-perf

tomtomwombat commented Apr 2, 2026

Uh oh!

basvanwesting commented Apr 2, 2026

Uh oh!

tomtomwombat commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomtomwombat commented Apr 2, 2026

Uh oh!

basvanwesting commented Apr 2, 2026

Uh oh!

tomtomwombat commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants