clib

A high-performance C data structure library featuring an Arena allocator, dynamic vectors, chaining and open addressing hash maps, and binary heaps. Built as part of a self-directed C curriculum.

Build

Requires gcc and make.

make         # Builds the static library libclib.a
make test    # Compiles and runs unit tests
make metric  # Compiles and runs performance benchmarks
make debug   # Builds with debug symbols and sanitisers

To remove compiled objects, dependency files and binaries:

make clean

Project Structure

include/clib/
├── arena.h            Slab-based chained memory management
├── vector.h           Dynamic contiguous array implementation
├── hashmap.h          Chaining-based hash map
├── ohashmap.h         Open-addressing hash map
├── heap.h             Binary heap (Min/Max)
├── iter.h             Generic iterator
├── test_framework.h   Macros for unit testing
└── metric_framework.h Nanosecond timing and RSS memory tracking
src/                   Core implementation (.c files)
tests/                 Unit tests (functional verification)
metrics/               Performance benchmarks for key operations

Core Architecture & API Reference

Arena Allocator (`arena.c/h`)

The Arena struct implements a slab-based chained memory management system. By allocating memory in large, contiguous blocks (slabs) and carving out space for smaller allocations, it significantly reduces the overhead associated with frequent malloc and free calls. This approach also improves cache locality by keeping related data physically close in memory.

Because the Arena chains Region structs together, it can grow dynamically as needed. This makes it an ideal backbone for the HashMap and OHashMap implementations, allowing them to manage their internal storage with minimum fragmentation. A key advantage of this architecture is bulk deallocation: instead of tracking and freeing thousands of individual nodes, a single call to arena_free reclaims all memory associated with the workload.

It is recommended to choose a large slab size to avoid pockets of wasted memory at the end of each arena.

Function	Description
`int arena_init(Arena *a, size_t slab_size)`	Initialises an `Arena` with a fixed `slab_size` for its initial and subsequent slabs.
`void arena_alloc(Arena a, size_t size)`	Allocates size bytes from the current slab. If the current slab is full, a new slab is automatically chained.
int arena_reset(Arena *a)	Rewinds the offsets in all allocated slabs to zero, allowing the memory to be reused without re-allocation.
void arena_free(Arena *a)	Frees all slabs and regions associated with the Arena back to the system.

Iterator (`iter.h`)

The Iter struct provides a unified interface for traversing any collection in the library. This allows you to write generic functions that process data regardless of whether it is stored in a Vector, Heap, HashMap or elsewhere. Once an Iter struct is initialised properly it can be used like such:

/* it.next(Iter *it) returns 0 (success), 1 (end), -1 (error) */
while (it.next(&it) == 0) {
  /* access it.current.value */
}

You can see in the individual data structure sections how Iter is utilised further but the general rule of thumb is that you need to define a constructor method (e.g. Iter vector_iter(Vector *v)) and a next method (e.g. int vector_iter_next(Iter *it)).

Vector (`vector.c/h`)

The Vector struct is a dynamic, contiguous array that supports amortised $O(1)$ insertions. It serves as the primary storage backend for the Binary Heap.

Function	Description
`int vector_init(Vector *v, size_t item_size)`	Initialises an empty `Vector`.
`int vector_push(Vector v, const void item)`	Appends `item` to `Vector`, resizing if necessary.
`void vector_get(const Vector v, size_t index)`	Returns a pointer to the item at `index`.
`int vector_pop(Vector v, void ptr)`	Removes the last item and assigns the pointer to that item to `*ptr`.
`void vector_free(Vector *v)`	Frees the allocated memory for `Vector` struct.
`Iter vector_iter(Vector *vector)`	Returns an `Iter` struct for iteration of items in `Vector`.
`int vector_iter_next(Iter *iter)`	Moves `Iter` struct to the next item in the `Vector` (no need for explicit use of this method).

Chaining HashMap (`hashmap.c/h`)

The HashMap struct is a separate chaining hash map implementation. Each bucket is the head of a linked list. This implementation is particularly robust against collision rates.

HashMap uses an arena allocator to store its key-value pairs. It is sensitive to low slab sizes for the arena allocator, but given a sensible slab size HashMap becomes the preferred implementation (see Performance Benchmarking for more details).

HashMap has customisable compare and hash functions. The default hash function is an implementation of FNV-1a, which treats the key as a raw sequence of bytes and is robust for most scenarios. The default compare function uses memcmp to compare the two blocks of memory for each key.

Function	Description
`int hashmap_init(HashMap m, size_t key_size, size_t val_size, Arena a)`	Initialises `HashMap` struct with `Arena` backed storage.
`void hashmap_set_functions(HashMap *m, HashFn h, CompareFn c)`	Sets functions for hash and compare functions (NULL for h or c, falls back to default).
`int hashmap_put(HashMap m, const void k, const void *v)`	Inserts or updates a key-value pair.
`void hashmap_get(const HashMap m, const void *k)`	Retrieves a pointer to the value associated with a key.
`int hashmap_remove(HashMap m, const void k)`	Removes key-value pair from `HashMap` if it exists.
`size_t hashmap_count(const HashMap *m)`	Returns the amount of key-value pairs in `HashMap` struct.
`void hashmap_free(HashMap *m)`	Frees the allocated memory for `HashMap` struct (this does not free memory allocated in `Arena`).
`Iter hashmap_iter(HashMap *m)`	Returns an `Iter` struct for iteration of key-value pairs in `HashMap` struct
`int hashmap_iter_next(Iter *it)`	Moves `Iter` struct to the next key-value pair in `HashMap` (no need for explicit use of this method).

Open-addressing HashMap (`ohashmap.c/h`)

The OHashMap struct is a linear probing hash map implementation. By storing all entries in a single contiguous array, it minimises pointer chasing and maximises CPU cache hits. It is ideal for workloads where memory locality is the primary performance bottleneck.

OHashMap uses an arena allocator to store its key-value pairs. It is robust to low slab sizes for the arena allocator, but this also means it sees minimal performance benefits from the arena allocator (see Performance Benchmarking for more details).

Like HashMap, OHashMap has customisable compare and hash functions. The default hash function is an implementation of FNV-1a, which treats the key as a raw sequence of bytes and is robust for most scenarios. The default compare function uses memcmp to compare the two blocks of memory for each key.

Function	Description
`int ohashmap_init(OHashMap m, size_t key_size, size_t val_size, Arena a)`	Initialises `OHashMap` struct with `Arena` backed storage.
`void ohashmap_set_functions(OHashMap *m, HashFn h, CompareFn c)`	Sets functions for hash and compare functions (`NULL` for `h` or `c`, falls back to default).
`int ohashmap_put(OHashMap m, const void k, const void *v)`	Inserts or updates a key-value pair.
`void ohashmap_get(const OHashMap m, const void *k)`	Retrieves a pointer to the value associated with a key.
`int ohashmap_remove(OHashMap m, const void k)`	Removes key-value pair from `OHashMap` if it exists.
`size_t ohashmap_count(const OHashMap *m)`	Returns the amount of key-value pairs in `OHashMap` struct.
`void ohashmap_free(OHashMap *m)`	Frees the allocated memory for `OHashMap` struct (this does not free memory allocated in `Arena`).
`Iter ohashmap_iter(OHashMap *m)`	Returns an `Iter` struct for iteration of key-value pairs in `OHashMap` struct
`int ohashmap_iter_next(Iter *it)`	Moves `Iter` struct to the next key-value pair in `OHashMap` (no need for explicit use of this method).

Binary Heap (`heap.c/h`)

The Heap struct is a priority queue implementation that is built on top of a Vector struct for item storage and organisation.

Heap contains a customisable compare function, allowing a custom hierarchy to be established. By default, this compare function provides the utility for an int based min-heap.

Function	Description
`int heap_init(Heap *h, size_t item_size, HeapCompareFn c)`	Initialises a `Heap` with a custom `HeapCompareFn` (`NULL` for `c`, falls back to deafult).
`void heap_heapify(Heap h, Vector d)`	Bulk-loads a `Vector` into a `Heap` in $O(n)$ time.
`int heap_push(Heap h, const void value)`	Standard $O(logn)$ insertion onto heap
`int heap_pop(Heap h, void ptr)`	Removes the top element, restores order ($O(\log n)$) and stores a pointer to the element in `*ptr`.
`void heap_peek(Heap h)`	Returns a pointer to the top element without removal.
`void heap_free(Heap *h)`	Frees the allocated memory for `Heap` struct.
`Iter heap_iter(Heap *heap)`	Returns an `Iter` struct for level-order traversal of items in the binary heap tree.
`int heap_iter_next(Iter *iter)`	Moves `Iter` struct to the next item in `Heap` (no need for explicit use of this method).

Performance Benchmarking

The library includes a custom benchmarking suite in metrics/ that measures:

Throughput: Average time per operation in nanoseconds ($ns/op$).
Memory Footprint: Tracks Resident Set Size (RSS) changes via /proc/self/status to measure physical memory impact in kilobytes (KB).

Note: All metrics were captured on an Intel i5-1135G7 @ 2.40GHz (16GB RAM).

Vector Benchmarks (`unit_vector.c`)

These are the results from running the compiled binaries for the metrics/unit_vector.c script. These tests are for an int based Vector.

Operation (N)	Speed	Memory
Push (1M)	5.71 ns/op	+4224 KB
Pop (1M)	3.61 ns/op	+0 KB
Get (1M)	1.52 ns/op	+0 KB
Iteration (1M)	1.27 ns/op	+0 KB

Analysis: Random access and iteration are essentially instantaneous, costing roughly 1-2 nanoseconds. Even the push operation, which includes capacity checks and occasional resizing, is highly optimised.

Heap Benchmarks (`unit_heap.c`)

These are the results from running the compiled binaries for the metrics/unit_heap.c script. These tests are for an int based Min-Heap.

Operation (N)	Speed	Memory
Push (Ascending Vals) (1M)	7.50 ns/op	+4224 KB
Push (Descending Vals) (1M)	193.83 ns/op	+3712 KB
Heapify (Descending Vals) (1M)	0.0155 s (whole Vector)	+0 KB
Pop (1M)	239.82 ns/op	+0 KB
Iteration (1M)	1.43 ns/op	+0 KB

Analysis: In a Min-Heap, pushing values that are already in order is nearly as fast as a Vector push ($O(1)$). However, the worst-case pop is significantly more expensive because it requires a two-way comparison at every level of the tree.

The Heapify Optimisation: Building a Min-Heap of 1 million items using repeated push calls would take ~192.5ms for descending values (and ~7.6ms for ascending values). By using heap_heapify, the same task (Descending Values) is completed in ~15.5ms (~12.5x speedup).

HashMap: Chaining vs. Open-Addressing (`vs_hashmap.c`)

These are the results from running the compiled binaries for the metrics/vs_hashmap.c script. These tests are for a HashMap (chaining), OHashMap (open-addressing) with load factor of 0.70.

Arena Slab Size - 4096 bytes:

Operation (N)	Speed	Memory
Put (Chaining) (1M)	177.18 ns/op	+56136 KB
Get (Chaining) (1M)	27.83 ns/op	+0 KB
Put (Open-Addressing) (1M)	209.53 ns/op	+73540 KB
Get (Open-Addressing) (1M)	42.75 ns/op	+0 KB

Arena Slab Size - 64 bytes:

Operation (N)	Speed	Memory
Put (Chaining) (1M)	289.08 ns/op	+89336 KB
Get (Chaining) (1M)	34.06 ns/op	+0 KB
Put (Open-Addressing) (1M)	220.43 ns/op	+49152 KB
Get (Open-Addressing) (1M)	42.88 ns/op	+0 KB

Analysis: The results reveal a significant performance shift depending on the Arena Slab Size.

Slab Size Impact (4096 bytes): With a larger slab size, Chaining is the clear winner. It is ~20% faster on put operations and ~50% faster on get operations. Because the Arena packs linked nodes into large, contiguous blocks, Chaining achieves high cache locality, effectively mitigating the "pointer chasing" typically associated with linked structures.
Slab Size Impact (64 bytes): When the slab size is reduced to 64 bytes, Chaining suffers from Slab Thrashing. Memory consumption jumps by ~60% (+56MB to +89MB) and put speed increases by ~60%. This likely happens because each small allocation for a node is likely triggering the allocation of a new Arena region, leading to significant memory fragmentation and overhead.
Open-Addressing Resilience: OHashMap is much more resilient to slab size changes. Because it maintains one large contiguous array, its performance remains stable regardless of whether the Arena provides 4KB or 64B slabs. Interestingly, at the 64B slab size, it becomes the faster implementation for put operations, as it avoids the allocation overhead that cripples Chaining in small-memory environments.

HashMap Collision Resilience: Chaining vs. Open-Addressing (`vs_hashmap_collisions.c`)

These are the results from running the compiled binaries for the metrics/vs_hashmap_collisions.c script. These tests are for a HashMap (chaining), OHashMap (open-addressing) with load factor of 0.70.

For this test, the hash maps use a custom bad_hash hash function that always returns 0. This means we have a 100% collision rate.

Arena Slab Size - 4096 bytes:

Operation (N)	Speed	Memory
Put (Chaining) (10K)	14463.53 ns/op	+512 KB
Get (Chaining) (10K)	13157.05 ns/op	+0 KB
Put (Open-Addressing) (10K)	38884.60 ns/op	+256 KB
Get (Open-Addressing) (10K)	27635.73 ns/op	+0 KB

Arena Slab Size - 64 bytes:

Operation (N)	Speed	Memory
Put (Chaining) (10K)	17691.04 ns/op	+768 KB
Get (Chaining) (10K)	18805.31 ns/op	+0 KB
Put (Open-Addressing) (10K)	38350.02 ns/op	+256 KB
Get (Open-Addressing) (10K)	28490.30 ns/op	+0 KB

Analysis: Forcing a 100% collision rate provides a "stress test" for the internal search and insertion logic. While both implementations degrade from $O(1)$ to $O(N)$, Chaining proves to be significantly more resilient.

Chaining Resilience: Even with 10,000 items in one bucket, Chaining is ~2x faster than Open-Addressing. Sequential Entry allocations within a large Arena slab ensure the linked list nodes remain physically adjacent, minimizing cache misses during the linear search.
The Open-Addressing "Primary Clustering" penalty: Every put and get must scan the entire contiguous block of occupied slots starting from index 0. This linear scan is significantly slower than the localised list traversal in Chaining, leading to a massive increase in $ns/op$.

Author

Created by WillEdgington

📧 willedge037@gmail.com | 🔗 LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
include/clib		include/clib
metrics		metrics
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clib

Build

Project Structure

Core Architecture & API Reference

Arena Allocator (`arena.c/h`)

Iterator (`iter.h`)

Vector (`vector.c/h`)

Chaining HashMap (`hashmap.c/h`)

Open-addressing HashMap (`ohashmap.c/h`)

Binary Heap (`heap.c/h`)

Performance Benchmarking

Vector Benchmarks (`unit_vector.c`)

Heap Benchmarks (`unit_heap.c`)

HashMap: Chaining vs. Open-Addressing (`vs_hashmap.c`)

HashMap Collision Resilience: Chaining vs. Open-Addressing (`vs_hashmap_collisions.c`)

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

clib

Build

Project Structure

Core Architecture & API Reference

Arena Allocator (arena.c/h)

Iterator (iter.h)

Vector (vector.c/h)

Chaining HashMap (hashmap.c/h)

Open-addressing HashMap (ohashmap.c/h)

Binary Heap (heap.c/h)

Performance Benchmarking

Vector Benchmarks (unit_vector.c)

Heap Benchmarks (unit_heap.c)

HashMap: Chaining vs. Open-Addressing (vs_hashmap.c)

HashMap Collision Resilience: Chaining vs. Open-Addressing (vs_hashmap_collisions.c)

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Arena Allocator (`arena.c/h`)

Iterator (`iter.h`)

Vector (`vector.c/h`)

Chaining HashMap (`hashmap.c/h`)

Open-addressing HashMap (`ohashmap.c/h`)

Binary Heap (`heap.c/h`)

Vector Benchmarks (`unit_vector.c`)

Heap Benchmarks (`unit_heap.c`)

HashMap: Chaining vs. Open-Addressing (`vs_hashmap.c`)

HashMap Collision Resilience: Chaining vs. Open-Addressing (`vs_hashmap_collisions.c`)

Packages