This is a repository to benchmark integer compression algorithms with sorted lists of integers.
These are the data sets that have been tested so far:
-
ts.txt: a text file with a large list of timestamps. It can be downloaded from here. -
gov2.sorted: a binary file containing a sequence of sorted lists of 32-bit integers. This file is part of the "Document identifier data set" created by D. Lemire. It can be downloaded from here.
The following results have been obtained on a laptop Ubuntu Desktop 21.10 with a Core i7-6700HQ CPU @ 2.60GHz x 8.
ts.txt:
| Algorithm | Encoded Size | Compression Ratio | Encoding Speed | Decoding Speed |
|---|---|---|---|---|
| VTEnc/1 (*) | 21,679 | 26,622 | 102.28 G/s | 7.32 G/s |
| Delta+FastPFor256 | 1,179,312 | 489.4 | 1.45 G/s | 3.61 G/s |
| Delta+FastPFor128 | 2,306,544 | 250.2 | 1.58 G/s | 3.88 G/s |
| Delta+BinaryPacking | 4,552,280 | 126.8 | 6.76 G/s | 4.31 G/s |
| Delta+VariableByte | 144,285,504 | 4.0 | 3.89 G/s | 3.43 G/s |
| Delta+VarIntGB | 180,356,880 | 3.2 | 5.05 G/s | 6.73 G/s |
| Copy | 577,141,992 | 1.0 | 11.83 G/s | - |
gov2.sorted:
| Algorithm | Encoded Size | Compression Ratio | Encoding Speed | Decoding Speed |
|---|---|---|---|---|
| VTEnc/1 | 2,885,170,285 | 8.29 | 154.76 M/s | 210.99 M/s |
| VTEnc/2 | 2,930,842,147 | 8.16 | 213.90 M/s | 282.39 M/s |
| VTEnc/4 | 3,144,764,347 | 7.61 | 306.40 M/s | 411.75 M/s |
| VTEnc/8 | 3,482,164,190 | 6.87 | 443.06 M/s | 594.89 M/s |
| Delta+FastPFor128 | 3,849,161,656 | 6.21 | 615.66 M/s | 615.64 M/s |
| Delta+FastPFor256 | 3,899,341,376 | 6.13 | 621.15 M/s | 621.17 M/s |
| VTEnc/16 | 3,922,307,294 | 6.10 | 639.35 M/s | 829.11 M/s |
| Delta+BinaryPacking | 4,329,919,808 | 5.52 | 2.30 G/s | 2.33 G/s |
| VTEnc/32 | 4,447,792,753 | 5.38 | 901.11 M/s | 1,124.46 M/s |
| VTEnc/64 | 5,044,828,934 | 4.74 | 1.15 G/s | 1.42 G/s |
| VTEnc/128 | 5,719,368,620 | 4.18 | 1.42 G/s | 1.68 G/s |
| VTEnc/256 | 6,561,323,112 | 3.65 | 1.60 G/s | 1.83 G/s |
| Delta+VariableByte | 6,572,084,696 | 3.64 | 1.62 G/s | 1.52 G/s |
| Delta+VarIntGB | 7,923,819,720 | 3.01 | 2.03 G/s | 2.97 G/s |
| Copy | 23,918,860,964 | 1.00 | 5.10 G/s | - |
(*) Number alongside VTEnc indicates value of min_cluster_length encoding parameter.
- benchmark v1.6.1: benchmarking library created by Google.
- SIMDCompressionAndIntersection: C++ library to encode/decode sorted lists of integers. It provides a series of integer algorithms optimised to work with sorted lists.
- VTEnc v0.3.0: C library that implements VTEnc algorithm.
To build this project, clone it in this way:
git clone --recurse-submodules https://github.com/vteromero/integer-compression-benchmarks
Then, build the included libraries. It's highly recommended to follow the build instructions for each library in their respective repositories. However, for convenience, here is a quick list of instructions:
# Go to the library root directory
$ cd benchmark
# Make a build directory to place the build output.
$ cmake -E make_directory "build"
# Generate build system files with cmake, and download any dependencies.
$ cmake -E chdir "build" cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release ../
# or, starting with CMake 3.13, use a simpler form:
# cmake -DCMAKE_BUILD_TYPE=Release -S . -B "build"
# Build the library.
$ cmake --build "build" --config Release
cd SIMDCompressionAndIntersection
make
cd VTEnc
make
Once the included libraries have been built, run make in the root directory. That will generate the executable intbench.
You can invoke intbench in a variety of ways. Here are some examples:
# '--data-dir' option is required and indicates the path to the directory that
# contains the data sets.
./intbench --data-dir=/home/user/data
# With '--benchmark_filter' you can choose which benchmarks to run.
./intbench --data-dir=/home/user/data --benchmark_filter=TimestampsDataSet
# '--benchmark_list_tests' lists all the available benchmarks.
./intbench --data-dir=/home/user/data --benchmark_list_tests
# Use '--help' to get more information about the available options.
./intbench --data-dir=/home/user/data --help
This code is licensed under MIT license.